Chrome Extension
WeChat Mini Program
Use on ChatGLM

Kcollections: A Fast and Efficient Library for K-mers

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2020)

Cited 1|Views19
No score
Abstract
K-mers form the backbone of many bioinformatic algorithms. They are, however, difficult to store and use efficiently because the number of k-mers increases exponentially as k increases. Many algorithms exist for compressed storage of kmers but suffer from slow insert times or are probabilistic resulting in false-positive k-mers. Furthermore, k-mer libraries usually specialize in associating specific values with k-mers such as a color in colored de Bruijn Graphs or k-mer count. We present kcollections 1 , a compressed and parallel data structure designed for k-mers generated from whole, assembled genomes. Kcollections is available for C++ and provides set-and maplike structures as well as a k-mer counting data structure all of which utilize parallel operations designed using a MapReduce paradigm. Additionally, we provide basic Python bindings for rapid prototyping. Kcollections makes developing bioinformatic algorithms simpler by abstracting away the tedious task of storing k-mers.
More
Translated text
Key words
data structure,genomics,k-mer,parallel programming
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined