Customization Assistant for Text-to-image Generation
CVPR 2024(2023)
摘要
Customizing pre-trained text-to-image generation model has attracted massive
research interest recently, due to its huge potential in real-world
applications. Although existing methods are able to generate creative content
for a novel concept contained in single user-input image, their capability are
still far from perfection. Specifically, most existing methods require
fine-tuning the generative model on testing images. Some existing methods do
not require fine-tuning, while their performance are unsatisfactory.
Furthermore, the interaction between users and models are still limited to
directive and descriptive prompts such as instructions and captions. In this
work, we build a customization assistant based on pre-trained large language
model and diffusion model, which can not only perform customized generation in
a tuning-free manner, but also enable more user-friendly interactions: users
can chat with the assistant and input either ambiguous text or clear
instruction. Specifically, we propose a new framework consists of a new model
design and a novel training strategy. The resulting assistant can perform
customized generation in 2-5 seconds without any test time fine-tuning.
Extensive experiments are conducted, competitive results have been obtained
across different domains, illustrating the effectiveness of the proposed
method.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要