1990 Us Census Form Recognition Using Ctc Network, Wfst Language Model, And Surname Correction

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1(2017)

引用 2|浏览15
暂无评分
摘要
This paper presents a system for transcribing 1990 US census forms. Extraction of information from census forms is useful for creating a genealogy database and better archiving census forms. We trained CTC/LSTM-RNN networks as our OCR engine. We solved the major challenge in language modeling by defining syntactical constraints with WFST language models. We made two major technical contributions in this paper. Firstly, 1990 US census forms were automatically transcribed with compelling accuracy for the first time using our system, which can be useful in downstream study in information extracted from census forms. Secondly, we designed a novel post-processing algorithm that improved the recognition accuracy of surnames significantly.
更多
查看译文
关键词
handwriting recognition,weighted finite state transducer,connectionist temporal classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要