Median and Small Parsimony Problems on RNA trees
CoRR(2024)
摘要
Motivation: Non-coding RNAs (ncRNAs) express their functions by adopting
molecular structures. Specifically, RNA secondary structures serve as a
relatively stable intermediate step before tertiary structures, offering a
reliable signature of molecular function. Consequently, within an RNA
functional family, secondary structures are generally more evolutionarily
conserved than sequences. Conversely, homologous RNA families grouped within an
RNA clan share ancestors but typically exhibit structural differences.
Inferring the evolution of RNA structures within RNA families and clans is
crucial for gaining insights into functional adaptations over time and
providing clues about the Ancient RNA World Hypothesis. Results: We introduce
the median problem and the small parsimony problem for ncRNA families, where
secondary structures are represented as leaf-labelled trees. We utilize the
Robinson-Foulds (RF) tree distance, which corresponds to a specific edit
distance between RNA trees, and a new metric called the Internal-Leafset (IL)
distance. While the RF tree distance compares sets of leaves descending from
internal nodes of two RNA trees, the IL distance compares the collection of
leaf-children of internal nodes. The latter is better at capturing differences
in structural elements of RNAs than the RF distance, which is more focused on
base pairs. We also consider a more general tree edit distance that allows the
mapping of base pairs that are not perfectly aligned. We study the theoretical
complexity of the median problem and the small parsimony problem under the
three distance metrics and various biologically-relevant constraints, and we
present polynomial-time maximum parsimony algorithms for solving some versions
of the problems. Our algorithms are applied to ncRNA families from the RFAM
database, illustrating their practical utility
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要