Harm Amplification in Text-to-Image Models
CoRR(2024)
摘要
Text-to-image (T2I) models have emerged as a significant advancement in
generative AI; however, there exist safety concerns regarding their potential
to produce harmful image outputs even when users input seemingly safe prompts.
This phenomenon, where T2I models generate harmful representations that were
not explicit in the input, poses a potentially greater risk than adversarial
prompts, leaving users unintentionally exposed to harms. Our paper addresses
this issue by first introducing a formal definition for this phenomenon, termed
harm amplification. We further contribute to the field by developing
methodologies to quantify harm amplification in which we consider the harm of
the model output in the context of user input. We then empirically examine how
to apply these different methodologies to simulate real-world deployment
scenarios including a quantification of disparate impacts across genders
resulting from harm amplification. Together, our work aims to offer researchers
tools to comprehensively address safety challenges in T2I systems and
contribute to the responsible deployment of generative AI models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要