Feedback Learning: Automating the Process of Correcting and Completing the Extracted Information

2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)(2019)

引用 11|浏览32
暂无评分
摘要
In recent years, with the increasing usage of digital media and advancements in deep learning architectures, most of the paper-based documents have been revolutionized into digital versions. These advancements have helped state-of-the-art information extraction and digital mailroom technologies become progressively efficient. Even though many efficient post-Information Extraction (IE) error rectification methods have been introduced in the recent past to improve the quality of digitized documents. They are still imperfect and they demand improvements in the area of context-based error correction, specifically when we are dealing with the documents involving sensitive information such as invoices. This paper describes the self-correction approach based on the sequence to sequence Neural Machine Translation (NMT) as applied to rectify the incorrectness in the results of any information extraction approach such as Optical Character Recognition (OCR). We accomplished this approach by exploiting the concepts of sequence learning with the help of feedback provided during each cycle of training. Finally, we have compared state-of-the-art post-OCR error correction methods with our feedback learning approach. Our empirical results have outperformed state-of-the-art post-OCR error correction methods.
更多
查看译文
关键词
Document Understanding, Post IE Error Correction and Completeness, Sequence to Sequence Neural Machine Translation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要