Contract-Driven Design of Scientific Data Analysis Workflows

2023 IEEE 19th International Conference on e-Science (e-Science)(2023)

引用 0|浏览13
暂无评分
摘要
Software systems enabling large-scale data analysis workflows (DAWs) are a key technology for many scientific disciplines, as they allow extracting new insights from experimental results. DAWs are (non-)linear pipelines composed of multiple interdependent tasks that are executed in a distributed fashion on large compute clusters. In science, the individual task implementations are developed by research groups all over the world and usually not tested outside a narrow scope of possible inputs, parameters, and infrastructures. As a result, the operations' correctness depends on many implicit assumptions. Among others this includes the completeness and suitability of input data, infrastructure properties such as available cores or main memory, etc. This combination of complexity, distribution and untested components makes quality assurance of DAWs a critical issue. In this paper, we propose to address this problem by introducing a contract-driven approach to DAW design and implementation. Following this method, DAW developers specify contracts in the form of requirements and promises for each task of a DAW. These contracts serve as guards to ensure that tasks run in a proper environment and produce correct results. We provide the first formal definition of contracts for DAWs and show how they are connected to DAW scheduling and execution. As a proof of concept, we extended Nextflow, a popular scientific workflow system, with contracts and defined a light-weight DSL for their specification. We exemplify the power of a contract-driven approach to DAW development by enhancing several real-world DAWs from Bioinformatics to capture typical problems during their execution and show how the specific notifications issued by broken contracts help debugging the DAWs.
更多
查看译文
关键词
scientific computing,data analysis workflows,design by contract,validity,robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要