Adaptive Cross-Modal Transferable Adversarial Attacks From Images to Videos

Zhipeng Wei,Jingjing Chen, Zuxuan Wu,Yu-Gang Jiang

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE(2024)

引用 0|浏览2
暂无评分
摘要
The cross-model transferability of adversarial examples makes black-box attacks to be practical. However, it typically requires access to the input of the same modality as black-box models to attain reliable transferability. Unfortunately, the collection of datasets may be difficult in security-critical scenarios. Hence, developing cross-modal attacks for fooling models with different modalities of inputs would highly threaten real-world DNNs applications. The above considerations motivate us to investigate cross-modal transferability of adversarial examples. In particular, we aim to generate video adversarial examples from white-box image models to attack video CNN and ViT models. We introduce the Image To Video (I2V) attack based on the observation that image and video models share similar low-level features. For each video frame, I2V optimizes perturbations by reducing the similarity of intermediate features between benign and adversarial frames on image models. Then I2V combines adversarial frames together to generate video adversarial examples. I2V can be easily extended to simultaneously perturb multi-layer features extracted from an ensemble of image models. To efficiently integrate various features, we introduce an adaptive approach to re-weight the contributions of each layer based on its cosine similarity values of the previous attack step. Experimental results demonstrate the effectiveness of the proposed method.
更多
查看译文
关键词
Cross-modal attack,transferable attack
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要