Feature Fusion Pyramid Network for End-to-end Scene Text DetectionJust Accepted

Yirui Wu,Lilai Zhang,Hao Li,Yunfei Zhang,Shaohua Wan

ACM Transactions on Asian and Low-Resource Language Information Processing（2022）

引用 0|浏览2

暂无评分

摘要

How to properly involve text characteristics like multi-scale, arbitrary direction, length aspect ratio, into detection network design has become a hot topic in computer vision. Feature Pyramid Network (FPN) is a typical method to achieve robust text detection, where its low-level and high-level feature map retains spatial structure and global semantic information, respectively. However, its strict hierarchical structure fails to fuse low-level and high-level information to improve distinguish ability of feature map. To address this problem, we propose a novel feature fusion pyramid network for end-to-end scene text detection by fusing multi-modal information. By diving pyramid structure into high-level and low-level layers, channel and spatial attention modules are adopted to enhance high-level and low-level feature representation by encoding channel and spatial -wise context information, respectively. In order to reduce information loss by layer transmission, a special residual network is designed to achieve short-cut between high-level and low-level features, so as to realize multi-modal feature fusion. Experiments show the precision and recall of the propose method on ICDAR2015, ICDAR2017-MLT and MSRA-TD500 datasets reach 88.7%/82.1%, 77.0%/60.3% and 85.3%/74.8%, respectively.

查看译文

关键词

Feature Pyramid Networks,Text Detection,Receptive Fields,Attention Module

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要