S5TR: Simple Single Stage Sequencer for Scene Text Recognition

ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II(2024)

引用 0|浏览3
暂无评分
摘要
As an active research topic in computer vision, scene text recognition (STR) aims to recognize character sequences in natural scenes. Currently, mainstream STR approaches consist of two main modules: a visual model for feature extraction and a sequence model for text translation. The two modules function separately and sequentially, which increases the complexity of the STR model. In this study, we propose a novel Simple Single Stage Sequencer for Scene Text Recognition (S5TR), which allows to transform text instance images into string sequences directly. Specifically, our S5TR contains stacks of Sequencers made of horizontal and vertical Long Short Term Memory Networks (LSTMs). On the one hand, S5TR extracts visual representations of images by modeling long-range dependencies via LSTM, which is similar to self-attention in Vision Transformer (ViT). On the other hand, LSTM serving as a sequence modeling module is able to capture contextual information within the character sequence for predicting the character. Experimental results demonstrate that our S5TR achieves highly competitive performance compared to existing STR methods.
更多
查看译文
关键词
Scene text recognition,Text transcription,LSTM,Neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要