Hadoop-BAM

Bioinformatics(2012)

引用 97|浏览40
暂无评分
摘要
Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps. Availability: Available under the open-source MIT license at http://sourceforge.net/projects/hadoop-bam/ Contact: matti.niemenmaa@aalto.fi Supplementary information:Supplementary material is available at Bioinformatics online.
更多
查看译文
关键词
analysis step,BAM record,Picard API,Bioinformatics online,BAM file,next-generation sequencing data,analysis application,Picard SAM JDK,BAM data access,convenient API
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要