Longest k-tuple Common Sub-Strings.

BIBM(2022)

引用 0|浏览11
暂无评分
摘要
We focus on a new problem that is formulated to find a longest k-tuple of common sub-strings (abbr. k-CSSs) of two or more strings. We present a suffix tree based algorithm for this problem, which can find a longest k-CSS of m strings in $O(kmn^{k})$ time and $O(kmn)$ space where n is the length sum of the m strings. This algorithm can be used to approximate the longest k-CSS problem to a performance ratio $\frac{1}{\epsilon}$ in $O(kmn^{\lceil\epsilon k\rceil})$ time for $\epsilon\in(0,1]$. Since the algorithm has the space complexity in linear order of n, it will show advantage in comparing particularly long strings. This algorithm proves that the problem that asks to find a longest gapped pattern of non-constant number of strings is polynomial time solvable if the gap number is restricted constant, although the problem without any restriction on the gap number was proved NP-Hard. Using a C++ tool that is reliant on the algorithm, we performed experiments of finding longest 2-CSSs, 3-CSSs and 5-CSSs of 2 ~ 14 COVID-19 S-proteins. Under the help of longest 2-CSSs and 3-CSSs of COVID-19 S-proteins, we identified the mutation sites in the S-proteins of two COVID-19 variants Delta and Omicron. The algorithm based tool is available for downloading at https://github.com/lytt0/k-CSS.
更多
查看译文
关键词
suffix tree,algorithm,common sub-string,COVID-19 variant
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要