HA-Transformer: Harmonious aggregation from local to global for object detection.

Yang Chen,Sihan Chen,Yongqiang Deng,Kunfeng Wang

Expert Syst. Appl.（2023）

引用 2|浏览13

暂无评分

摘要

Recently, the Vision Transformer (ViT) with global modeling capability has shown its excellent performance in classification task, which innovates the development direction for a series of vision tasks. However, due to the enormous cost of multi-head self-attention, reducing computational cost while holding the capability of global interaction remains a big challenge. In this paper, we propose a new architecture by establishing an end-to-end connection from local to global via bridge tokens, so that the global interaction is completed at the window level, effectively solving the quadratic complexity problem of transformer. Besides, we consider a hierarchy of information from short-distance to long-distance, which adds a transition module from local to global to make a more harmonious aggregation of information. Our proposed method is named HA-Transformer. The experimental results on COCO dataset show excellent performance of HA-Transformer for object detection, outperforming several state-of-the-art methods.

查看译文

关键词

Object detection,Transformer,multi-head self-attention,global interaction,transition module

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要