Validation of document image defect models for optical character recognition

Proc. of 3rd Annual Symposium on Document Analysis and Information Retrieval(1994)

引用 22|浏览4
暂无评分
摘要
In this paper we consider the problem of evaluating models for physical defects a ecting the optical character recognition (OCR) process. While a number of such models have been proposed, the contention that they produce the desired result is typically argued in an ad hoc and informal way. We introduce a rigorous and more pragmatic de nition of when a model is accurate: we say a defect model is validated if the OCR errors induced by the model are e ectively indistinguishable from the errors encountered when using real scanned documents. We present two measures to quantify this similarity: the Vector Space method and the Coin Bias method. The former adapts an approach used in information retrieval, the latter simulates an observer attempting to do better than a\random" guesser. We compare and contrast the two techniques based on experimental data; both seem to work well, suggesting this is an appropriate formalism for the development and evaluation of document image defect models.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要