Spectr: Scalable Parallel Short Read Error Correction On Multi-Core And Many-Core Architectures

PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING(2018)

引用 24|浏览44
暂无评分
摘要
Modern high throughput sequencing platforms can produce large amounts of short read DNA data at low cost. Error correction is an important but time-consuming initial step when processing this data in order to improve the quality of downstream analyses. In this paper, we present a Scalable Parallel Error CorrecToR designed to improve the throughput of DNA error correction for Illumina reads on various parallel platforms. Our design is based on a k-spectrum approach where a Bloom filter is frequently probed as a key operation and is optimized towards AVX-512-based multi-core CPUs, Xeon Phi many-cores (both KNC and KNL), and heterogeneous compute clusters. A number of architecture-specific optimizations are employed to achieve high performance such as memory alignment, vectorized Bloom filter probing, and a stack-based iteration to eliminate recursion. Our experiments show that our optimizations result in speedups of up to 2.8, 5.2, and 9.3 on a CPU (Xeon W-2123), a KNC-based Xeon Phi (31S1P), and a KNL-based Xeon Phi (7210), respectively, compared to a multi-threaded CPU reference implementation for the error correction stage. Furthermore, when executed on the same hardware, SPECTR achieves a speedup of up to 1.7, 2.1, 2.4, and 6.4, compared to the state-of-the-art tools Lighter, BLESS2, RECKONER, and Musket, respectively. In addition, our MPI implementation exhibits an efficiency of around 86% when executed on 32 nodes of the Tianhe-2 supercomputer. SPECTR is available at https://github.com/Xu-Kai/SPECTR.
更多
查看译文
关键词
Bioinformatics, Xeon Phi, Vectorization, MPI, Bloom filter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要