C3TB-YOLOv5: integrated YOLOv5 with transformer for object detection in high-resolution remote sensing images

Qinggang Wu, Yang Li,Wei Huang,Qiqiang Chen, Yonglei Wu

INTERNATIONAL JOURNAL OF REMOTE SENSING(2024)

引用 0|浏览0
暂无评分
摘要
In the realm of object detection from high-resolution remote sensing images (HRRSIs), the existing YOLOv5 methods encounter several challenges, including dense object arrangements, small object sizes, and complex backgrounds. To tackle these challenges, we propose a novel approach called C3TB-YOLOv5, which combines traditional YOLOv5 with the Transformer model to detect objects in HRRSIs. Unlike conventional YOLOv5 methods that primarily focus on capturing local information from remote sensing scenes, our C3TB-YOLOv5 method incorporates global information through the introduction of a new C3TB module. This module, based on the Transformer multi-head attention mechanism (AM), consists of two branches that extract local and global information from feature maps. By integrating these branches and establishing long-range relationships, our method successfully detects densely arranged small objects in HRRSIs. Furthermore, to improve the accuracy of tiny object detection, a novel detection head has been developed to effectively utilize the unused C3 module, thereby preventing the loss of fine-grained textures and positional features. In addition, we integrate an enhanced SimAM, namely Sim-GMP, into the model to adjust the focus across varying regions, effectively distinguishing the features of interested objects from complex backgrounds. Finally, to address the problem of sample imbalance in remote sensing object detection, the most recent Wise-IoU v3 loss function is employed to improve the accuracy of anchor box predictions for objects. To maintain a high object detection speed, the most critical C3 modules are substituted with the proposed C3TB module for the purpose of striking a good balance between object detection accuracy and model lightweight. Extensive experiments conducted on two remote sensing datasets of NWPU VHR-10 and VisDrone 2019 demonstrates that our method achieves superior object detection performance than state-of-the-art methods.
更多
查看译文
关键词
High-resolution remote sensing images (HRRSIs),object detection,YOLOv5,transformer,tiny detection head,attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要