Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems

Marine Djaffardjy, George Marchment, Clemence Sebe, Raphael Blanchet, Khalid Bellajhame,Alban Gaignard,Frederic Lemoine,Sarah Cohen-Boulakia

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL(2023)

Cited 2|Views11
No score
Abstract
Data analysis pipelines are now established as an effective means for specifying and executing bioinfor-matics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high per-formance computing clusters. This review outlines the key requirements for building large-scale data pi-pelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.(c) 2023 Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).
More
Translated text
Key words
Scientific workflows,Bioinformatics,Reuse,Reproducibility
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined