Reproducible Workflows and Compute Environments for Reusable Datasets, Simulations and Research Software

crossref(2024)

引用 0|浏览0
暂无评分
摘要
The pursuit of reproducibility in research has long been emphasized. It is even more critical in geohazards research and practice, where model-based decision-making needs to be transparent for trustworthy applications. However, enabling reproducibility in process-based or machine learning workflows requires time, energy, and sometimes manual operations or even unavailable resources. Moreover, the diversity in modern compute environments, both in hardware and software, significantly hinders the path to reproducibility. While many researchers focus on reproducibility, we advocate that reusability holds greater value and inherently requires the former. Reusable datasets and simulations can allow for transparent and reliable decision support, analysis as well as benchmarking studies. Reusable research software can foster composition and faster development of complex projects, while avoiding the reinvention of complicated data structures and algorithms. Establishing reproducible workflows and compute environments is vital to enable and ensure reusability. Prioritising reproducible workflows is crucial for individual use, while both reproducible compute environments and workflows are essential for broader accessibility and reuse by others. We present herein various challenges faced in coming up with reproducible workflows and compute environments along with solution strategies and recommendations through experiences from two projects in geohazards research. We discuss an object-oriented approach to simulation workflows, automated metadata extraction and data upload, unique identification of datasets (assets) and simulation workflows (processes) through cryptographic hashes. We investigate essential factors, such as software versioning and dependency management, reproducibility across diverse hardware used by researchers, and time to first reproduction/reuse (TTFR), to establish reproducible computational environments. Finally, we shall explore the landscape of reproducibility in compute environments, covering language-agnostic package managers, containers, and language-specific package managers supporting binary dependencies.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要