Transformer fusion and histogram layer multispectral pedestrian detection network

SIGNAL IMAGE AND VIDEO PROCESSING(2023)

引用 0|浏览4
暂无评分
摘要
Due to the complementarity of multispectral data, the performance of pedestrian detection can be significantly improved, so multispectral pedestrian detection has received great attention from the research community. However, existing pedestrian detection algorithms still suffer from some problems, such as insufficient information exchange between the two streams, and lack of targeted network design for the characteristics of the image source. In practical application scenarios, different targeted network models are generally used during the day and night, and the day model and night model can be simply switched during the deduction process. Therefore, we propose two subnetworks FTHd (Fusion Transformer Histogram day) and FTn (Fusion Transformer night) for the characteristics of daytime and nighttime images. The texture features of RGB images during the day are more obvious. We first add a histogram layer to the input branch of the detection network. After that, we added the cross-modal feature fusion method CFT (Cross-Modal Fusion Transformer) module to fuse and interact features. By leveraging the Transformer’s self-attention, the network can naturally perform intra-modal and inter-modal fusion. The light at night is very weak, and thermal images play a key role. Since the texture information is weak, complex network structures are not required, and we combine the two streams into one stream to reduce the amount of computation. Finally, we add a CFT module to fuse and interact features. Compared with baseline methods, the proposed FTHd and FTn achieve improved pedestrian detection accuracy.
更多
查看译文
关键词
Multispectral pedestrian detection,Multimodal fusion,Transformer,Histogram layer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要