Reducing the Human Effort in Text Line Segmentation for Historical Documents

DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III(2021)

引用 3|浏览2
暂无评分
摘要
Labeling the layout in historical documents for preparing training data for machine learning techniques is an arduous task that requires great human effort. A draft of the layout can be obtained by using a document layout analysis (DLA) system that later can be corrected by the user with less effort than doing it from scratch. We research in this paper an iterative process in which the user only supervises and corrects the given draft for the pages automatically selected by the DLA system with the aim of reducing the required human effort. The results obtained show that similar DLA quality can be achieved by reducing the number of pages that the user has to annote and that the accumulated human effort required to obtain the layout of the pages used to train the models can be reduced more than 95%.
更多
查看译文
关键词
Document layout analysis, Text line segmentation, Human effort reduction, Historical document
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要