GetPt: Graph-enhanced General Table Pre-training with Alternate Attention Network.

Ran Jia, Haoming Guo,Xiaoyuan Jin, Chao Yan,Lun Du,Xiaojun Ma,Tamara Stankovic, Marko Lozajic, Goran Zoranovic,Igor Ilic,Shi Han, Dongmei Zhang 0001

KDD(2023)

引用 0|浏览135
暂无评分
摘要
Tables are widely used for data storage and presentation due to their high flexibility in layout. The importance of tables as information carriers and the complexity of tabular data understanding attract a great deal of research on large-scale pre-training for tabular data. However, most of the works design models for specific types of tables, such as relational tables and tables withwell-structured headers, neglecting tables with complex layouts. In real-world scenarios, there are many such tables beyond the target scope of previous research and are thus not well supported. In this paper, we propose GetPt, a unified pre-training architecture for general table representation applicable even to tables with complex structures and layouts. First, we convert a table to a heterogeneous graph to represent the layout of the table. Based on the graph, a specially designed transformer is applied to jointly model the semantics and structure of the table. Second, we devise an Alternate Attention Network (AAN) to better model the contextual information across multiple granularities of a table including the tokens, cells, and table. To better support a wide range of downstream tasks, we further employ three pre-training objectives and pre-train the model on a large table dataset. We fine-tune and evaluate GetPt model on two representative tasks, table type classification, and table structure recognition. Experiments show that GetPt outperforms existing state-of-the-art methods on these tasks.
更多
查看译文
关键词
table pre-training,graph transformer,table understanding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要