谷歌浏览器插件
订阅小程序
在清言上使用

Estimation of genome size using k-mer frequencies from corrected long reads

arXiv (Cornell University)(2020)

引用 0|浏览3
暂无评分
摘要
The third-generation long reads sequencing technologies, such as PacBio and Nanopore, have great advantages over second-generation Illumina sequencing in de novo assembly studies. However, due to the inherent low base accuracy, third-generation sequencing data cannot be used for k-mer counting and estimating genomic profile based on k-mer frequencies. Thus, in current genome projects, second-generation data is also necessary for accurately determining genome size and other genomic characteristics. We show that corrected third-generation data can be used to count k-mer frequencies and estimate genome size reliably, in replacement of using second-generation data. Therefore, future genome projects can depend on only one sequencing technology to finish both assembly and k-mer analysis, which will largely decrease sequencing cost in both time and money. Moreover, we present a fast light-weight tool kmerfreq and use it to perform all the k-mer counting tasks in this work. We have demonstrated that corrected third-generation sequencing data can be used to estimate genome size and developed a new open-source C/C++ k-mer counting tool, kmerfreq, which is freely available at https://github.com/fanagislab/kmerfreq.
更多
查看译文
关键词
genome size,reads,k-mer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要