Reproducible Scientific Workflows for High Performance and Cloud Computing

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)(2019)

引用 7|浏览11
暂无评分
摘要
Many complex data analysis tasks are performed by scientific workflows and pipelines deployed on high performance computing (HPC) or cloud computing resources. The complex software stack required by a workflow and unnoticed dependencies can make the deployment of a pipeline a demanding task. Once deployed, workflows tend to be black boxes, especially for users that did not create the pipeline themselves. At the end of a project a researcher should archive the pipeline in order to ensure reproducibility of published results. This paper illustrates a possible solution for each of the three tasks: reproducible deployment via software containers, automated generation of provenance information to break black boxes, and using the CiTAR service for archiving software containers.
更多
查看译文
关键词
workflow,deployment,reproducibility,container,provenance,archiving
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要