谷歌浏览器插件
订阅小程序
在清言上使用

Parstr: partially autoregressive scene text recognition

International Journal on Document Analysis and Recognition (IJDAR)(2024)

引用 0|浏览4
暂无评分
摘要
An autoregressive (AR) decoder for scene text recognition (STR) requires numerous generation steps to decode a text image character by character but can yield high recognition accuracy. On the other hand, a non-autoregressive (NAR) decoder generates all characters in a single generation but suffers from a loss of recognition accuracy. This is because, unlike the former, the latter assumes that the predicted characters are conditionally independent. This paper presents a Partially Autoregressive Scene Text Recognition (PARSTR) method that unifies both AR and NAR decoding within the same model. To reduce decoding steps while maintaining recognition accuracy, we devise two decoding strategies: b-first and b-ahead, reducing the decoding steps to approximately b and by a factor of b, respectively. The experimental results demonstrate that our PARSTR models using the devised decoding strategies present a balanced compromise between efficiency and recognition accuracy compared to the fully AR and NAR decoding approaches. Specifically, the experimental results on public benchmark STR datasets demonstrate the potential to reduce decoding steps down to at most five steps and by a factor of five under the b-first and b-ahead decoding schemes, respectively, while having a slight reduction of total word recognition accuracy of less than or equal to 0.5
更多
查看译文
关键词
Scene text recognition (STR),Autoregressive (AR),Non-autoregressive (NAR) decoder,Partially autoregressive (PAR) decoder
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要