LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training
CoRR(2024)
摘要
Table structure recognition (TSR) aims at extracting tables in images into
machine-understandable formats. Recent methods solve this problem by predicting
the adjacency relations of detected cell boxes or learning to directly generate
the corresponding markup sequences from the table images. However, existing
approaches either count on additional heuristic rules to recover the table
structures, or face challenges in capturing long-range dependencies within
tables, resulting in increased complexity. In this paper, we propose an
alternative paradigm. We model TSR as a logical location regression problem and
propose a new TSR framework called LORE, standing for LOgical location
REgression network, which for the first time regresses logical location as well
as spatial location of table cells in a unified network. Our proposed LORE is
conceptually simpler, easier to train, and more accurate than other paradigms
of TSR. Moreover, inspired by the persuasive success of pre-trained models on a
number of computer vision and natural language processing tasks, we propose two
pre-training tasks to enrich the spatial and logical representations at the
feature level of LORE, resulting in an upgraded version called LORE++. The
incorporation of pre-training in LORE++ has proven to enjoy significant
advantages, leading to a substantial enhancement in terms of accuracy,
generalization, and few-shot capability compared to its predecessor.
Experiments on standard benchmarks against methods of previous paradigms
demonstrate the superiority of LORE++, which highlights the potential and
promising prospect of the logical location regression paradigm for TSR.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要