Bangla Optical Character Recognition (OCR) Using Deep Learning Based Image Classification Algorithms

2021 24th International Conference on Computer and Information Technology (ICCIT)(2021)

引用 3|浏览1
暂无评分
摘要
Optical Character Recognition (OCR) refers to the process of converting images of printed, typed, or handwritten text into machine-readable text. OCR is one of the most widely researched topics in the field of computer vision. Furthermore, highly accurate, and sophisticated Optical Character Recognition systems have been built for most of the major languages of the world such as English, French, German, Mandarin, etc. However, despite having 300 million native speakers (4.00% of the world population) and being the 5th most spoken language of the world, the Bengali language still does not have a state-of-the-art OCR system. Moreover, most of the existing systems are not able to recognize compound letters. This study strives to resolve this issue by proposing three neural network based image classification models for Bangla OCR. These models are Inception V3, VGG16, and Vision Transformer. These models have been trained on the BanglaLekha-Isolated dataset that contains 98,950 images of Bengali characters (vowels, consonants, digits, compound letters). The accuracy provided by the VGG-16, Inception V3, and Vision Transformer on the test set are 98.65%, 97.82%, and 96.88% respectively. Each of these models is much more accurate than the existing systems. Real-time implementation of these three models will be instrumental in building a state-of-the-art Bangla OCR system.
更多
查看译文
关键词
Deep Learning,Bangla OCR,Optical Character Recognition,OCR,CNN,Inception V3,VGG-16,Vision Transformer,Image Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要