Numerical Formula Recognition from Tables

Knowledge Discovery and Data Mining(2021)

引用 2|浏览35
暂无评分
摘要
ABSTRACTClaims over the numerical relationships among some measures are commonly expressed in tabular forms, and widely exist in the published documents on the Web. This paper introduces the problem of numerical formula recognition from tables, namely recognizing all numerical formulas inside a given table. It can well support many interesting downstream applications, such as numerical error correction in tables, formula recommendation in tables. Here, we emphasize that table is a kind of language that adopts a different linguistic paradigm from natural language. It uses visual grammar like visual layout and visual settings (e.g., indentation, font style) to express the grammatical relationships among the table cells. Understanding tables and recognizing formulas require decoding the visual grammar while simultaneously understanding the textual information. Another challenge is that formulas are complicated in terms of diverse math functions and variable-length of arguments. To address these challenges, we convert this task into a uniform framework, extracting relations of table cell pairs in a table. A two-channel neural network model TaFor is proposed to embed both the textual and visual features for a table cell. Our framework achieves the formula-level F1-score = 0.90 on a real-world dataset of 190179 tables while a retrieval-based method achieves F1-score = 0.72. We also perform extensive experiments to demonstrate the effectiveness of each component in our model, and conduct a case study to discuss the limits of the proposed model. With our published data this study also aims to attract the community's interest in deep semantic understanding over tables.
更多
查看译文
关键词
numerical formulas, tabular presentation, table understanding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要