Transcription Free LSTM OCR Model Evaluation

2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)(2018)

引用 4|浏览20
暂无评分
摘要
In recent years there has been significant progress in the field of Optical Character Recognition (OCR), mainly due to the use of various LSTM-based architectures. In the classic supervised training setup for LSTM-based OCR, the available image data and corresponding transcription is split into a training, a validation and a test set. Especially in the context of historical documents generating these transcriptions can be very costly, therefore minimizing the required transcribed data or maximizing the size of the training set to generate better models are desirable. We propose a novel method to evaluate LSTM OCR-models without requiring transcription ground truth data. For this we employ a second LSTM in an encoder-decoder setup to recreate the image data from the OCR output and evaluate the model based on its difference to the original input. We show that this approach performs similar to traditional transcription based evaluation on a historical document from the 16th century.
更多
查看译文
关键词
OCR,LSTM,Evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要