Feature Fusion Pyramid Network for End-to-end Scene Text DetectionJust Accepted

ACM Transactions on Asian and Low-Resource Language Information Processing(2022)

引用 0|浏览2
暂无评分
摘要
How to properly involve text characteristics like multi-scale, arbitrary direction, length aspect ratio, into detection network design has become a hot topic in computer vision. Feature Pyramid Network (FPN) is a typical method to achieve robust text detection, where its low-level and high-level feature map retains spatial structure and global semantic information, respectively. However, its strict hierarchical structure fails to fuse low-level and high-level information to improve distinguish ability of feature map. To address this problem, we propose a novel feature fusion pyramid network for end-to-end scene text detection by fusing multi-modal information. By diving pyramid structure into high-level and low-level layers, channel and spatial attention modules are adopted to enhance high-level and low-level feature representation by encoding channel and spatial -wise context information, respectively. In order to reduce information loss by layer transmission, a special residual network is designed to achieve short-cut between high-level and low-level features, so as to realize multi-modal feature fusion. Experiments show the precision and recall of the propose method on ICDAR2015, ICDAR2017-MLT and MSRA-TD500 datasets reach 88.7%/82.1%, 77.0%/60.3% and 85.3%/74.8%, respectively.
更多
查看译文
关键词
Feature Pyramid Networks,Text Detection,Receptive Fields,Attention Module
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要