HMMPolish: a coding region polishing tool for TGS-sequenced RNA viruses

Briefings in bioinformatics(2023)

引用 0|浏览1
暂无评分
摘要
Access to accurate viral genomes is important to downstream data analysis. Third-generation sequencing (TGS) has recently become a popular platform for virus sequencing because of its long read length. However, its per-base error rate, which is higher than next-generation sequencing, can lead to genomes with errors. Polishing tools are thus needed to correct errors either before or after sequence assembly. Despite promising results of available polishing tools, there is still room to improve the error correction performance to perform more accurate genome assembly. The errors, particularly those in coding regions, can hamper analysis such as linage identification and variant monitoring. In this work, we developed a novel pipeline, HMMPolish, for correcting (polishing) errors in protein-coding regions of known RNA viruses. This tool can be applied to either raw TGS reads or the assembled sequences of the target virus. By utilizing profile Hidden Markov Models of protein families/domains in known viruses, HMMPolish can correct errors that are ignored by available polishers. We extensively validated HMMPolish on 34 datasets that covered four clinically important viruses, including HIV-1, influenza-A, norovirus, and severe acute respiratory syndrome coronavirus 2. These datasets contain reads with different properties, such as sequencing depth and platforms (PacBio or Nanopore). The benchmark results against popular/representative polishers show that HMMPolish competes favorably on error correction in coding regions of known RNA viruses.
更多
查看译文
关键词
Third-generation sequencing (TGS),polisher for TGS assemblies,Ribonucleic Acid (RNA) viruses,profile HMMs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要