Genealogical inference and more flexible sequence clustering using iterative PopPUNK

Genome Research(2023)

引用 0|浏览14
暂无评分
摘要
Bacterial genome data are accumulating at an unprecedented speed due the routine use of sequencing in clinical diagnoses, public health surveillance and population genetics studies. Genealogical reconstruction is fundamental to many of these uses, however, inferring genealogy from large-scale genome datasets quickly, accurately, and flexibly is still a challenge. Here, we extend an alignment- and annotation-free method, PopPUNK, to increase its flexibility and interpretability across datasets. Our method, iterative-PopPUNK, rapidly produces multiple consistent cluster assignments across a range of sequence identities. By constructing a partially resolved genealogical tree with respect to these clusters, users can select a resolution most appropriate for their needs. We demonstrated the accuracy of clusters at all levels of similarity and genealogical inference of iterative-PopPUNK based on simulated data and obtained phylogenetically-concordant results in real datasets from seven bacterial species. Using two example sets of Escherichia/Shigella genomes and Vibrio parahaemolyticus genomes we show that iterative-PopPUNK can achieve cluster resolutions ranging from phylogroup down to sequence typing (ST). The iterative-PopPUNK algorithm is implemented in the ‘PopPUNK_iterate’ program, available as part of PopPUNK package. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要