Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach
arxiv(2024)
摘要
In the contemporary era of widespread online recruitment, resume
understanding has been widely acknowledged as a fundamental and crucial task,
which aims to extract structured information from resume documents
automatically. Compared to the traditional rule-based approaches, the
utilization of recently proposed pre-trained document understanding models can
greatly enhance the effectiveness of resume understanding. The present
approaches have, however, disregarded the hierarchical relations within the
structured information presented in resumes, and have difficulty parsing
resumes in an efficient manner. To this end, in this paper, we propose a novel
model, namely ERU, to achieve efficient resume understanding. Specifically, we
first introduce a layout-aware multi-modal fusion transformer for encoding the
segments in the resume with integrated textual, visual, and layout information.
Then, we design three self-supervised tasks to pre-train this module via a
large number of unlabeled resumes. Next, we fine-tune the model with a
multi-granularity sequence labeling task to extract structured information from
resumes. Finally, extensive experiments on a real-world dataset clearly
demonstrate the effectiveness of ERU.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要