FastAAI: Efficient Estimation of Genome Average Amino Acid Identity and Phylum-level relationships using Tetramers of Universal Proteins

crossref(2022)

引用 0|浏览2
暂无评分
摘要
Abstract Estimation of whole-genome relatedness and taxonomic identification are two important bioinformatics tasks in describing environmental or clinical microbiomes. The genome-aggregate Average Nucleotide Identity (ANI) is routinely used to derive the relatedness of closely related (species level) microbial and viral genomes, but it is not appropriate for more divergent genomes. Average Amino Acid Identity (AAI) can be used in the latter cases, but no current AAI implementation can efficiently compare thousands of genomes. Here we present FastAAI, a tool that estimates whole-genome pairwise relatedness using shared tetramers of universal proteins in a matter of microseconds, providing a speedup of up to 5 orders of magnitude when compared with current methods of calculating AAI or alternative whole-genome metrics. Further, FastAAI resolves distantly related genomes related at the phylum level with comparable accuracy to the phylogeny of ribosomal rRNA genes, substantially improving on a known limitation of current AAI implementations. Therefore, Fast AAI uniquely expands the toolbox for microbiome analysis and allows it to scale to millions of genomes.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要