From HMMs to RNNs: Computer-Assisted Transcription of a Handwritten Notarial Records Collection

2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)(2018)

引用 9|浏览30
暂无评分
摘要
We present the process which is being followed for the transcription of a large XVIII Century Manuscript collection with the help of Handwritten Text Recognition (HTR) Technology. The documents are being processed in batches of 50 pages each. For each batch we perform two semi-supervised processes: one in order to analyze the layout and detect the text lines and another to provide the full transcripts of the text. As per users request, both diplomatic and modernized transcripts, as well as semantically tagged versions are being produced. Layout analysis supervision is performed by means of a conventional layout editing tool. On the other hand, transcripts, including automatic modernization and tagging, are being produced by means of a web based computer-assisted interactive-predictive tool (CATTI). We present results of the performance of this process through 12 image batches processed so far. These results show the impact caused by an optical modelling technological transition: from classical HMM-based methods to new technology based on recurrent neural networks.
更多
查看译文
关键词
handwritten text recognition,hidden markov model,document layout analysis,recurrent neural networks,computer assisted transcription
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要