RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions.

Horticulture research(2023)

Cited 0|Views15
No score
Abstract
Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%-90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined