gPartition: An Efficient Alignment Partitioning Program for Genome Datasets

Le Kim Thu, Do Duc Dong,Bui Ngoc Thang,Hoang Thi Diep, Nguyen Phuong Thao,Le Sy Vinh

VNU Journal of Science: Computer Science and Communication Engineering(2022)

引用 0|浏览0
暂无评分
摘要
Phylogenomics, or evolutionary inference based on genome alignment, is becoming prominent thanks to next-generation sequencing technologies. In model-based phylogenomics, the partition scheme has a significant impact on inference performance, both in terms of log-likelihoods and computation time. Therefore, finding an optimal partition scheme, or partitioning, is critical in a phylogenomic inference pipeline. To accomplish this, one needs to divide the alignment sites into disjoint partitions so that the sites of similar evolutionary models are in the same partition. Computational partitioning is a recent approach of increasing interest due to its capability of modeling the site-rate heterogeneity within a single gene. State-of-the-art computational partitioning methods, such as mPartition or RatePartition, are, however, ineffective on long alignments of millions of sites. In this paper, we introduce gPartition, a new computational partitioning method leveraging both the site rate and the best-fit substitution model. We conducted experiments on recently published alignments to compare gPartition with mPartition and RatePartition. gPartition was orders of magnitude faster than other methods. The AIC score demonstrated that gPartition produced partition schemes that were better or comparable to mPartition. gPartition outperformed RatePartition on all examined alignments. We implemented our proposed method in the gPartition program to help researchers partition genome alignments with millions of sites more efficiently.
更多
查看译文
关键词
efficient alignment partitioning program,genome
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要