Abstract 5351: Retrospective analysis of cancer exomes withRoslin, a portable and reproducible workflow infrastructure

Cancer Research(2018)

引用 0|浏览4
暂无评分
摘要
Reproducibility and portability are persistent problems in the analysis of genomic sequence data. Bioinformatics pipelines usually run only in the controlled computing environment in which they were built. But even when built for portability, it is increasingly complex and sometimes impossible to maintain versioned and reproducible pipeline components like reference genomes, sequence aligners, variant callers, and annotation sources. As a solution to these technical challenges, we present Roslin, a bioinformatics workflow system deployed at Memorial Sloan Kettering Cancer Center. We validate its precision and recall rates for detecting somatic alterations, microsatellite instability, and other relevant biomarkers from retrospective exome recapture of DNA from over 1000 patients, that previously underwent clinical sequencing using MSK-IMPACT. Roslin is written in the Common Workflow Language (CWL), a standard specification requiring tasks to be modularized and inputs and outputs be explicitly defined. The requirements of explicitness and modularization enable CWL workflows to be flexible, portable, scalable, and amenable to container technologies such as Docker and Singularity. Roslin utilizes the Toil workflow manager from the University of California at Santa Cruz, a portable, open-source workflow engine that supports CWL and is designed to securely and reproducibly run scientific workflows efficiently at scale. Roslin leverages Singularity, the Lawrence Berkley National Laboratory container system that is notionally similar to Docker. The Singularity container system is designed for mobility of compute and reproducibility of scientific analysis. Singularity containers are used to package complete scientific workflows, software and libraries, and data. Combining these makes Roslin well suited to run versioned bioinformatics workflows on cluster, cloud, and high-performance computing environments at scale. Roslin has been deployed and tested on multiple high performance computational clusters and cloud computing resources. It supports complete versioning of its workflows, the underlying software and libraries, and associated resource files. It offers end users GUI driven workflow logging, run reporting and real-time tracking. The Roslin CWL workflows are also suitable for deployment and execution on platforms without Toil, as Docker versions of every Singularity container are also provided. (https://github.com/mskcc/roslin) Citation Format: Christopher Harris, Jaeyoung Chun, Cyriac Kandoth, Nikhil Kumar, Shweta Chavan, Ronak Shah, Ewa Reza, Aaron Gabow, Christopher A. Bolipata, Barry Taylor, Oliver A. Hampton, Nicholas D. Socci, David Solit. Retrospective analysis of cancer exomes with Roslin, a portable and reproducible workflow infrastructure [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 5351.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要