Chrome Extension
WeChat Mini Program
Use on ChatGLM

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Annual Meeting of the Association for Computational Linguistics(2024)

Cited 0|Views10
No score
Abstract
Text-to-image diffusion models (T2I) use a latent representation of a textprompt to guide the image generation process. However, the process by which theencoder produces the text representation is unknown. We propose the DiffusionLens, a method for analyzing the text encoder of T2I models by generatingimages from its intermediate representations. Using the Diffusion Lens, weperform an extensive analysis of two recent T2I models. Exploring compoundprompts, we find that complex scenes describing multiple objects are composedprogressively and more slowly compared to simple scenes; Exploring knowledgeretrieval, we find that representation of uncommon concepts requires furthercomputation compared to common concepts, and that knowledge retrieval isgradual across layers. Overall, our findings provide valuable insights into thetext encoder component in T2I pipelines.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined