Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks
CoRR(2023)
摘要
The widespread use of Text-to-Image (T2I) models in content generation
requires careful examination of their safety, including their robustness to
adversarial attacks. Despite extensive research into this, the reasons for
their effectiveness are underexplored. This paper presents an empirical study
on adversarial attacks against T2I models, focusing on analyzing factors
associated with attack success rates (ASRs). We introduce a new attack
objective - entity swapping using adversarial suffixes and two gradient-based
attack algorithms. Human and automatic evaluations reveal the asymmetric nature
of ASRs on entity swap: for example, it is easier to replace "human" with
"robot" in the prompt "a human dancing in the rain." with an adversarial suffix
but is significantly harder in reverse. We further propose probing metrics to
establish indicative signals from the model's beliefs to the adversarial ASR.
We identify conditions resulting in a 60% success probability for adversarial
attacks and others where this likelihood drops below 5%.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要