The Impact of Visual Similarities of Arabic-Like Scripts Regarding Learning in an OCR System

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)(2017)

引用 8|浏览5
暂无评分
摘要
Many languages use Arabic script for written communication either in basic or augmented form. These languages include Urdu, Pashto, Persian, etc. As the primary characters are shared among all these languages, it is possible to take advantage of the visual similarities for Optical Character Recognition (OCR). OCR models optimized for individual languages have been proposed. However, to the best of our knowledge, there is no attempt to develop a single system for more than one language. The contributions of the presented work are: First, it investigates the effect on the recognition accuracy when different languages are combined (A pioneering study). Second, it introduces publicly available synthetic datasets for Arabic and Pashto languages for experimental purposes. Third, this paper provides statistical analysis as clues for transfer learning concerning OCR systems for Arabic, Urdu, and Pashto languages.
更多
查看译文
关键词
Arabic script based languages,Generalized OCR,MDLSTM,CTC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要