q-gram hash comparison based multiple exact string matching algorithm for DNA sequences

Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi(2023)

引用 1|浏览3
暂无评分
摘要
The exact string matching algorithms are among the important study topics in computer science due to their various applications in many fields such as medicine, bioinformatics, and biology. New algorithms have been developed recently, and the string matching on the text has been accelerated. The string matching algorithms are divided into two parts, single and multiple. . The string matching algorithms are divided into two parts, single and multiple. The multiple exact string matching algorithms involve finding d number patterns (P) in a given text T. In this study, the Wu-Manber algorithm, one of the hash-based multiple exact string matching algorithms, is discussed. Although the Wu-Manber algorithm is effective, it has some limitations, such as hash collisions. In our study, a new approach has is proposed for these limitations. In the proposed approach, unlike the traditional Wu-Manber algorithm, the searching in the sequences is performed by q-gram hash comparison, using the hash function that removes hash collisions in DNA sequences. The proposed approach has been compared with the multiple exact string matching algorithms with the well-known algorithms in the literature on E. Coli and Human Chromosome1 datasets. As a result of the experimental studies, better results have been achieved in terms of performance metrics such as the average runtime, the average number of character and hash comparisons in the proposed approach compared to the Wu-Manber algorithm. Also, the proposed approach is shown to be more efficient than well-known algorithms, such as Aho Corasick (AC) and Commentz Walter (CW).
更多
查看译文
关键词
Multiple exact string matching,pattern matching,sequence analysis,hash function,wu-manber algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要