DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
CVPR 2024(2024)
摘要
The diffusion-based text-to-image model harbors immense potential in
transferring reference style. However, current encoder-based approaches
significantly impair the text controllability of text-to-image models while
transferring styles. In this paper, we introduce \textit{DEADiff} to address
this issue using the following two strategies: 1) a mechanism to decouple the
style and semantics of reference images. The decoupled feature representations
are first extracted by Q-Formers which are instructed by different text
descriptions. Then they are injected into mutually exclusive subsets of
cross-attention layers for better disentanglement. 2) A non-reconstructive
learning method. The Q-Formers are trained using paired images rather than the
identical target, in which the reference image and the ground-truth image are
with the same style or semantics. We show that DEADiff attains the best visual
stylization results and optimal balance between the text controllability
inherent in the text-to-image model and style similarity to the reference
image, as demonstrated both quantitatively and qualitatively. Our project page
is~\href{https://tianhao-qi.github.io/DEADiff/}{https://tianhao-qi.github.io/DEADiff/}.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要