Computing Minimal Absent Words and Extended Bispecial Factors with CDAWG Space

CoRR(2024)

引用 0|浏览0
暂无评分
摘要
A string w is said to be a minimal absent word (MAW) for a string S if w does not occur in S and any proper substring of w occurs in S. We focus on non-trivial MAWs which are of length at least 2. Finding such non-trivial MAWs for a given string is motivated for applications in bioinformatics and data compression. Fujishige et al. [TCS 2023] proposed a data structure of size Θ(n) that can output the set 𝖬𝖠𝖶(S) of all MAWs for a given string S of length n in O(n + |𝖬𝖠𝖶(S)|) time, based on the directed acyclic word graph (DAWG). In this paper, we present a more space efficient data structure based on the compact DAWG (CDAWG), which can output 𝖬𝖠𝖶(S) in O(|𝖬𝖠𝖶(S)|) time with O(e) space, where e denotes the minimum of the sizes of the CDAWGs for S and for its reversal S^R. For any strings of length n, it holds that e < 2n, and for highly repetitive strings e can be sublinear (up to logarithmic) in n. We also show that MAWs and their generalization minimal rare words have close relationships with extended bispecial factors, via the CDAWG.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要