Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
arxiv(2024)
Abstract
With the growing popularity of text-to-image generative models, there has
been increasing focus on understanding their risks and biases. Recent work has
found that state-of-the-art models struggle to depict everyday objects with the
true diversity of the real world and have notable gaps between geographic
regions. In this work, we aim to increase the diversity of generated images of
common objects such that per-region variations are representative of the real
world. We introduce an inference time intervention, contextualized Vendi Score
Guidance (c-VSG), that guides the backwards steps of latent diffusion models to
increase the diversity of a sample as compared to a "memory bank" of previously
generated images while constraining the amount of variation within that of an
exemplar set of real-world contextualizing images. We evaluate c-VSG with two
geographically representative datasets and find that it substantially increases
the diversity of generated images, both for the worst performing regions and on
average, while simultaneously maintaining or improving image quality and
consistency. Additionally, qualitative analyses reveal that diversity of
generated images is significantly improved, including along the lines of
reductive region portrayals present in the original model. We hope that this
work is a step towards text-to-image generative models that reflect the true
geographic diversity of the world.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined