Trycycler: consensus long-read assemblies for bacterial genomes

bioRxiv (Cold Spring Harbor Laboratory)(2021)

引用 0|浏览0
暂无评分
摘要
AbstractAssembly of bacterial genomes from long-read data (generated by Oxford Nanopore or Pacific Biosciences platforms) can often be complete: a single contig for each chromosome or plasmid in the genome. However, even complete bacterial genome assemblies constructed solely from long reads still contain a variety of errors, and different assemblies of the same genome often contain different errors. Here, we present Trycycler, a tool which produces a consensus assembly from multiple input assemblies of the same genome. Benchmarking using both simulated and real sequencing reads showed that Trycycler consensus assemblies contained fewer errors than any of those constructed with a single long-read assembler. Post-assembly polishing with Medaka and Pilon further reduced errors and yielded the most accurate genome assemblies in our study. As Trycycler can require human judgement and manual intervention, its output is not deterministic, and different users can produce different Trycycler assemblies from the same input data. However, we demonstrated that multiple users with minimal training converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools. We therefore recommend Trycycler+Medaka+Pilon as an ideal approach for generating high-quality bacterial reference genomes.Data availabilitySupplementary figures, tables and code can be found at: github.com/rrwick/Trycycler-paperReads, assemblies and reference sequences can be found at: bridges.monash.edu/articles/dataset/Trycycler_paper_dataset/14890734
更多
查看译文
关键词
assemblies,long-read
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要