REmatch: a novel regex engine for finding all matches.

Cristian Riveros, Nicolás Van Sint Jan,Domagoj Vrgoc

Proc. VLDB Endow.(2023)

引用 0|浏览7
暂无评分
摘要
In this paper, we present the RE match system for information extraction. REmatch is based on a recently proposed enumeration algorithm for evaluating regular expressions with capture variables supporting the all-match semantics. It tells a story of what it takes to make a theoretically optimal algorithm work in practice. As we show here, a naive implementation of the original algorithm would have a hard time dealing with realistic workloads. We thus develop a new algorithm and a series of optimizations that make REmatch as fast or faster than many popular RegEx engines while at the same time being able to return all the outputs: a task that most other engines tend to struggle with.
更多
查看译文
关键词
novel regex engine,finding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要