Spectr: Scalable Parallel Short Read Error Correction On Multi-Core And Many-Core Architectures

Kai Xu,Robin Kobus,Yuandong Chan,Ping Gao,Xiangxu Meng,Yanjie Wei,Bertil Schmidt,Weiguo Liu

PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING（2018）

引用 24|浏览44

暂无评分

摘要

Modern high throughput sequencing platforms can produce large amounts of short read DNA data at low cost. Error correction is an important but time-consuming initial step when processing this data in order to improve the quality of downstream analyses. In this paper, we present a Scalable Parallel Error CorrecToR designed to improve the throughput of DNA error correction for Illumina reads on various parallel platforms. Our design is based on a k-spectrum approach where a Bloom filter is frequently probed as a key operation and is optimized towards AVX-512-based multi-core CPUs, Xeon Phi many-cores (both KNC and KNL), and heterogeneous compute clusters. A number of architecture-specific optimizations are employed to achieve high performance such as memory alignment, vectorized Bloom filter probing, and a stack-based iteration to eliminate recursion. Our experiments show that our optimizations result in speedups of up to 2.8, 5.2, and 9.3 on a CPU (Xeon W-2123), a KNC-based Xeon Phi (31S1P), and a KNL-based Xeon Phi (7210), respectively, compared to a multi-threaded CPU reference implementation for the error correction stage. Furthermore, when executed on the same hardware, SPECTR achieves a speedup of up to 1.7, 2.1, 2.4, and 6.4, compared to the state-of-the-art tools Lighter, BLESS2, RECKONER, and Musket, respectively. In addition, our MPI implementation exhibits an efficiency of around 86% when executed on 32 nodes of the Tianhe-2 supercomputer. SPECTR is available at https://github.com/Xu-Kai/SPECTR.

查看译文

关键词

Bioinformatics, Xeon Phi, Vectorization, MPI, Bloom filter

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要