Space Anomalies in OCRs for Arabic Like Scripts

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)(2018)

引用 3|浏览16
暂无评分
摘要
This paper investigates and analyses the nature of errors occurring in Optical Character Recognition (OCR) for Arabic-like scripts. Existing research on the area of OCR for Arabic-like scripts often focuses on achieving the best performance in terms of character error rates. Only little effort targets at the analysis of the nature of these errors (anomalies) that may occur. One such important anomaly is Space Anomaly. This anomaly is due to the presence of breaker characters that are an essential part of Arabic-like scripts. The spaces introduced by breaker characters are not depicted in the ground truth making it hard for OCR to generalize. The OCR model either learns to inhibit the original spaces or to generate extra spaces at places where they are not correct. Due to this confusion, the rendering looks sub-optimal. This analyses and removes space anomalies. We present a joint approach that does not only perform OCR but also handles the space anomalies in a robust manner, hence significantly outperforming the state-of-the-art. Although the implication of the work is shown by improved character recognition rate, the impact of this research is much higher in terms of the correctness of the OCR for useful purposes, especially for rendering. The claim is supported by empirical evaluation and it is shown that the proposed approach achieved the best results.
更多
查看译文
关键词
Arabic Script,Pashto,Space Anomaly,OCR,MDLSTM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要