Chrome Extension
WeChat Mini Program
Use on ChatGLM

Can Formal Languages Help Pangenomics to Represent and Analyze Multiple Genomes?

International Conference on Developments in Language Theory (DLT)(2022)

Cited 1|Views10
No score
Abstract
Graph pangenomics is a new emerging field in computational biology that is changing the traditional view of a reference genome from a linear sequence to a new paradigm: a sequence graph (pangenome graph or simply pangenome) that represents the main similarities and differences in multiple evolutionary related genomes. The speed in producing large amounts of genome data, driven by advances in sequencing technologies, is far from the slow progress in developing new methods for constructing and analyzing a pangenome. Most recent advances in the field are still based on notions rooted in established and quite old literature on combinatorics on words, formal languages and space efficient data structures. In this paper we discuss two novel notions that may help in managing and analyzing multiple genomes by addressing a relevant question: how can we summarize sequence similarities and dissimilarities in large sequence data? The first notion is related to variants of the Lyndon factorization and allows to represent sequence similarities for a sample of reads, while the second one is that of sample specific string as a tool to detect differences in a sample of reads. New perspectives opened by these two notions are discussed.
More
Translated text
Key words
analyze multiple genomes,pangenomics,formal languages
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined