Phylogenetic tree distance computation over succinct representations
CoRR(2023)
摘要
There are several tools available to infer phylogenetic trees, which depict
the evolutionary relationships among biological entities such as viral and
bacterial strains in infectious outbreaks, or cancerous cells in tumor
progression trees. These tools rely on several inference methods available to
produce phylogenetic trees, with resulting trees not being unique. Thus,
methods for comparing phylogenies that are capable of revealing where two
phylogenetic trees agree or differ are required. An approach is then to compute
a similarity or dissimilarity measure between trees, with the Robinson- Foulds
distance being one of the most used, and which can be computed in linear time
and space. Nevertheless, given the large and increasing volume of phylogenetic
data, phylogenetic trees are becoming very large with hundreds of thousands of
leafs. In this context, space requirements become an issue both while computing
tree distances and while storing trees. We propose then an efficient
implementation of the Robinson-Foulds distance over trees succinct
representations. Our implementation generalizes also the Robinson-Foulds
distances to labelled phylogenetic trees, i.e., trees containing labels on all
nodes, instead of only on leaves. Experimental results show that we are able to
still achieve linear time while requiring less space. Our implementation is
available as an open-source tool at
https://github.com/pedroparedesbranco/TreeDiff.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要