Attention Deficit is Ordered! Fooling Deformable Vision Transformers with Collaborative Adversarial Patches.
CoRR(2023)
摘要
The latest generation of transformer-based vision models have proven to be
superior to Convolutional Neural Network (CNN)-based models across several
vision tasks, largely attributed to their remarkable prowess in relation
modeling. Deformable vision transformers significantly reduce the quadratic
complexity of modeling attention by using sparse attention structures, enabling
them to be used in larger scale applications such as multi-view vision systems.
Recent work demonstrated adversarial attacks against transformers; we show that
these attacks do not transfer to deformable transformers due to their sparse
attention structure. Specifically, attention in deformable transformers is
modeled using pointers to the most relevant other tokens. In this work, we
contribute for the first time adversarial attacks that manipulate the attention
of deformable transformers, distracting them to focus on irrelevant parts of
the image. We also develop new collaborative attacks where a source patch
manipulates attention to point to a target patch that adversarially attacks the
system. In our experiments, we find that only 1% patched area of the input
field can lead to 0% AP. We also show that the attacks provide substantial
versatility to support different attacker scenarios because of their ability to
redirect attention under the attacker control.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要