Full-view salient feature mining and alignment for text-based person search

Sheng Xie,Canlong Zhang, Enhao Ning, Zhixin Li,Zhiwen Wang, Chunrong Wei

Expert Systems with Applications(2024)

引用 0|浏览0
暂无评分
摘要
Text-based person search aims to retrieve relevant person images from a large database given textual queries. However, single-view limitation of surveillance cameras and cross-modal heterogeneity still remain challenging open issues. To address these, we propose a Full-view Salient Feature Mining Network (FLAN) to improve text-image matching in this task. Our FLAN introduces two key innovations. First, the Diffusion-based Full-view Image Augmentation generates informative full-view data from a single image to simulate human visual observation and learn view-invariant features. Second, the Dual-max Text Attention module optimizes spatial and channel-wise text attentions to extract the most discriminative words characterizing the person. Together, these innovations handle insufficient, imbalanced, and heterogeneous data for more accurate matching. Extensive experiments on three text-based person search datasets, CUHK-PEDES, ICFG-PEDES and RSTPReid, demonstrate superior performance of our FLAN with improved robustness and generalization.
更多
查看译文
关键词
Text-based person search,Diffusion,Full-view,Generation,Text attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要