Reconstructing The Gigabase Plant Genome Of Solanum pennellii Using Nanopore Sequencing

bioRxiv(2017)

引用 5|浏览49
暂无评分
摘要
Recent updates in sequencing technology have made it possible to obtain Gigabases of sequence data from one single flowcell. Prior to this update, the nanopore sequencing technology was mainly used to analyze and assemble microbial samples. Here, we describe the generation of a comprehensive nanopore sequencing dataset with a median fragment size of 11,979 bp for the wild tomato species Solanum pennellii featuring an estimated genome size of ca 1.0 to 1.1 Gbases. We describe its genome assembly to a contig N50 of 2.5 MB using a pipeline comprising a Canu pre-processing and a subsequent assembly using SMARTdenovo. We show that the obtained nanopore based de novo genome reconstruction is structurally highly similar to that of the reference S. pennellii LA7165 genome but has a high error rate caused mostly by deletions in homopolymers. After polishing the assembly with Illumina short read data we obtained an error rate of u003c0.02% when assessed versus the same Illumina data. More importantly however we obtained a gene completeness of 96.53% which even slightly surpasses that of the reference S. pennellii genome. Taken together our data indicate such long read sequencing data can be used to affordably sequence and assemble Gbase sized diploid plant genomes. Raw data is available at http://www.plabipd.de/portal/solanum-pennellii and has been deposited as PRJEB19787.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要