Text Role Classification in Scientific Charts Using Multimodal Transformers
CoRR(2024)
摘要
Text role classification involves classifying the semantic role of textual
elements within scientific charts. For this task, we propose to finetune two
pretrained multimodal document layout analysis models, LayoutLMv3 and UDOP, on
chart datasets. The transformers utilize the three modalities of text, image,
and layout as input. We further investigate whether data augmentation and
balancing methods help the performance of the models. The models are evaluated
on various chart datasets, and results show that LayoutLMv3 outperforms UDOP in
all experiments. LayoutLMv3 achieves the highest F1-macro score of 82.87 on the
ICPR22 test dataset, beating the best-performing model from the ICPR22
CHART-Infographics challenge. Moreover, the robustness of the models is tested
on a synthetic noisy dataset ICPR22-N. Finally, the generalizability of the
models is evaluated on three chart datasets, CHIME-R, DeGruyter, and EconBiz,
for which we added labels for the text roles. Findings indicate that even in
cases where there is limited training data, transformers can be used with the
help of data augmentation and balancing methods. The source code and datasets
are available on GitHub under
https://github.com/hjkimk/text-role-classification
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要