QueryTrack: Joint-modality Query Fusion Network for RGBT Tracking.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society(2024)

引用 0|浏览1
暂无评分
摘要
Existing RGB-Thermal trackers usually treat intra-modal feature extraction and inter-modal feature fusion as two separate processes, therefore the mutual promotion of extraction and fusion is neglected. Then, the complementary advantages of RGB-T fusion are not fully exploited, and the independent feature extraction is not adaptive to modal quality fluctuation during tracking. To address the limitations, we design a joint-modality query fusion network, in which the intra-modal feature extraction and the inter-modal fusion are coupled together and promote each other via joint-modality queries. The queries are initialized based on the multimodal features of the current frame, making the subsequent fusion adaptive to modal quality fluctuation during tracking. Then the joint-modality query fusion (JQF) utilizes the queries to interact with RGB-T features, allowing the intra-modal enhancement and the inter-modal interactions to be unified for mutual promotion. In this way, JQF can distinguish and enhance the complementary modality features, while filtering out redundant information. For real-time tracking, we propose regional cross-attention for cross-modal interactions to reduce computational cost. Our end-to-end tracker sets a new state-of-the-art performance on multiple RGBT tracking benchmarks including LasHeR, VTUAV, RGBT234 and GTOT, while running at a real-time speed.
更多
查看译文
关键词
Multi-modal fusion,RGB-T tracking,vision transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要