M3ANet: Multi-Modal and Multi-Attention Fusion Network for Ship License Plate Recognition

IEEE TRANSACTIONS ON MULTIMEDIA(2024)

Cited 0|Views6
No score
Abstract
Shiplicense plate recognition (SLPR) plays an important role in intelligent waterway management, but few attention has been paid to SLPR in scene text recognition (STR) community. Inspired by various outstanding achievements on STR, combined the intrinsic properties of SLPR, we propose a Multi-Modal and Multi-Attention dynamic fusion network (M(3)ANet) for SLPR in this article. Specifically, the visual-language joint modeling for SLPR is developed and the channel-spatial-self attention dynamic fusion mechanism is proposed for accuracy boosting. Explicitly fusing linguistic information extracted from ship name related corpus improves the adaptability of the recognition model to occlusion, background confusion, blur, etc., which is integrated with vision features to establish a multi-modal recognition network. Gated fully fusion is utilized to fuse visual features re-weighted by multi-attention components, inducing flexible compatibility with multiple types of decoders and more refined recognition decoder inputs. Additionally, to comprehensively mine spatially salient text regions in ship license plate images, we investigate the grouped spatial attention. Extensive experiments empirically demonstrate the effectiveness of M(3)ANet and superior performance (93.80% with regular images, while 90.34% with irregular images) on two benchmarks.
More
Translated text
Key words
Ship license plate recognition (SLPR),text recognition,attention,language modeling,multi-modal and multi-attention fusion network (M(3)ANet)
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined