谷歌Chrome浏览器插件
订阅小程序
在清言上使用

Performance evaluation and tuning of BioPig for genomic analysis.

DISCS@SC(2015)

引用 6|浏览11
暂无评分
摘要
In this study, we aim to optimize Hadoop parameters to improve the performance of BioPig on Amazon Web Service (AWS). BioPig is a toolkit for large-scale sequencing data analysis and is built on Hadoop and Pig that enables easy parallel programming and scaling to datasets of terabyte sizes. AWS is the most popular cloud-computing platform offered by Amazon. When running BioPig jobs on AWS, the default configuration parameters may lead to high computational costs. We select the k-mer counting as it is used in a large number of next generation sequence (NGS) data analysis tools. We tuned Hadoop parameters from five different perspectives based on a baseline configuration. We found tuning different Hadoop parameters led to various performance improvements. The overall job execution time of k-mer counting on BioPig was reduced by 50% using an optimized set of parameters. This paper documents our tuning experiments as a valuable reference for future Hadoop-based analytics applications on genomics datasets.
更多
查看译文
关键词
biopig
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要