Siamclim: text-based pedestrian search via multi-modal siamese contrastive learning

Runlin Huang, Shuyang Wu,Leiping Jie,Xinxin Zuo,Hui Zhang

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP(2023)

引用 0|浏览6
暂无评分
摘要
Text-based pedestrian search (TBPS) aims at retrieving target persons from the image gallery through descriptive text queries. Despite remarkable progress in recent state-of-the-art approaches, previous works still struggle to efficiently extract discriminative features from multi-modal data. To address the problem of cross-modal fine-grained text-to-image, we proposed a novel Siamese Contrastive Language-Image Model (SiamCLIM). The model implements textual description and target-person interaction through deep bi-lateral projection, and siamese network structure to capture the relationship between text and image. Experiments show that our model significantly outperforms the state-of-the-art methods on cross-modal fine-grained matching tasks. We conduct the downstream task experiments on the benchmark dataset CUHK-PEDES and the experimental results demonstrate that our model is state-of-the-art and outperforms the current methods by 11.55%, 11.02%, and 7.76% in terms of top-1, top-5, and top-10 accuracy, respectively.
更多
查看译文
关键词
Text-based person search,text-image,multi-modal,contrastive learning,Siamese Network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要