Harnessing machine translation methods for sequence alignment

biorxiv(2022)

引用 0|浏览15
暂无评分
摘要
The sequence alignment problem is one of the most fundamental problems in bioinformatics and a plethora of methods were devised to tackle it. Here we introduce BetaAlign, a novel methodology for aligning sequences using a natural language processing (NLP) approach. BetaAlign accounts for the possible variability of the evolutionary process among different datasets by using an ensemble of transformers, each trained on millions of samples generated from a different evolutionary model. Our approach leads to outstanding alignment accuracy, often outperforming commonly used methods, such as MAFFT, DIALIGN, ClustalW, T-Coffee, and MUSCLE. Notably, the utilization of deep-learning techniques for the sequence alignment problem brings additional advantages, such as automatic feature extraction that can be leveraged for a variety of downstream analysis tasks. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
sequence alignment,machine translation methods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要