Chrome Extension
WeChat Mini Program
Use on ChatGLM

Towards Video-Text Retrieval Adversarial Attack.

Haozhe Yang,Yuhan Xiang,Ke Sun, Jianlong Hu,Xianming Lin

IEEE International Conference on Acoustics, Speech, and Signal Processing(2024)

Cited 0|Views4
No score
Abstract
Video-text retrieval has widespread applications in economic and security domains, making it crucial to evaluate its robustness through adversarial attack. However, the existing research in this field is inadequate. In this paper, we first introduce adversarial attack to this task. By leveraging the concept of metric learning, we propose novel attack methods Cross-modal Dual Level Contrastive Attack (CDCA) and Cross-modal Rank Pairing Attack (CRPA). In the white-box scenario, CDCA utilizes the distribution of head and tail examples in the retrieval list to form positive and negative example sets, employing both coarse and fine-grained features. In the black-box scenario, CRPA employs the rank difference in retrieval list as example pairs and utilizes the Rank Difference Loss (RDL) as the attack objective function. Experiments validate the superiority of our methods. Furthermore, we contribute a benchmark, which lays a foundation for understanding the vulnerability of multi-modal models.
More
Translated text
Key words
Video-Text Retrieval,Adversarial Attack,Cross-modality
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined