Chrome Extension
WeChat Mini Program
Use on ChatGLM

MAFormer: A transformer network with multi-scale attention fusion for visual recognition

NEUROCOMPUTING(2024)

Cited 2|Views91
No score
Abstract
Vision Transformer and its variants have demonstrated great potential in various computer vision tasks. However conventional vision transformers often focus on global dependency at a coarse level, which results in a learning challenge on global relationships and fine-grained representation at a token level. In this paper, we introduce Multi -scale Attention Fusion into transformer ( MAFormer ), which explores local aggregation and global feature extraction in a dual -stream framework for visual recognition. We develop a simple but effective module to explore the full potential of transformers for visual representation by learning finegrained and coarse -grained features at a token level and dynamically fusing them. Our Multi -scale Attention Fusion (MAF) block consists of: i) a local window attention branch that learns short-range interactions within windows, aggregating fine-grained local features; ii) global feature extraction through a novel Global Learning with Down -sampling (GLD) operation to efficiently capture long-range context information within the whole image; iii) a fusion module that self -explores the integration of both features via attention. Our MAFormer achieves state-of-the-art results on several common vision tasks. In particular, MAFormer-L achieves 85.9% Top -1 accuracy on ImageNet, surpassing CSWin-B and LV-ViT-L by 1.7% and 0.6% respectively. On MSCOCO, MAFormer outperforms the prior art CSWin by 1.7% mAPs on object detection and 1.4% on instance segmentation with similar -sized parameters. With the performance, MAFormer demonstrates the ability to generalize across various visual benchmarks and prospects as a general backbone for different self -supervised pre -training tasks in the future.
More
Translated text
Key words
Vision transformer,Multi-scale attention fusion
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined