How To Efficiently Increase Resolution in Neural OCR Models

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)(2018)

引用 4|浏览22
暂无评分
摘要
Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n 2 ) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the "base" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.
更多
查看译文
关键词
fixed line height,input resolution,recognition performance,input images,CRNN architecture,computation,input line height,convolutional pooling layers,max pooling layers,manageable resolution,base CRNN model,memory requirements,neural OCR models,CRNN OCR models,downsample high resolution images
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要