Tuning-Free Image Customization with Image and Text Guidance
arxiv(2024)
摘要
Despite significant advancements in image customization with diffusion
models, current methods still have several limitations: 1) unintended changes
in non-target areas when regenerating the entire image; 2) guidance solely by a
reference image or text descriptions; and 3) time-consuming fine-tuning, which
limits their practical application. In response, we introduce a tuning-free
framework for simultaneous text-image-guided image customization, enabling
precise editing of specific image regions within seconds. Our approach
preserves the semantic features of the reference image subject while allowing
modification of detailed attributes based on text descriptions. To achieve
this, we propose an innovative attention blending strategy that blends
self-attention features in the UNet decoder during the denoising process. To
our knowledge, this is the first tuning-free method that concurrently utilizes
text and image guidance for image customization in specific regions. Our
approach outperforms previous methods in both human and quantitative
evaluations, providing an efficient solution for various practical
applications, such as image synthesis, design, and creative photography.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要