The Journey, Not the Destination: How Data Guides Diffusion Models
CoRR(2023)
Abstract
Diffusion models trained on large datasets can synthesize photo-realistic
images of remarkable quality and diversity. However, attributing these images
back to the training data-that is, identifying specific training examples which
caused an image to be generated-remains a challenge. In this paper, we propose
a framework that: (i) provides a formal notion of data attribution in the
context of diffusion models, and (ii) allows us to counterfactually validate
such attributions. Then, we provide a method for computing these attributions
efficiently. Finally, we apply our method to find (and evaluate) such
attributions for denoising diffusion probabilistic models trained on CIFAR-10
and latent diffusion models trained on MS COCO. We provide code at
https://github.com/MadryLab/journey-TRAK .
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined