Coherent Zero-Shot Visual Instruction Generation
arxiv(2024)
摘要
Despite the advances in text-to-image synthesis, particularly with diffusion
models, generating visual instructions that require consistent representation
and smooth state transitions of objects across sequential steps remains a
formidable challenge. This paper introduces a simple, training-free framework
to tackle the issues, capitalizing on the advancements in diffusion models and
large language models (LLMs). Our approach systematically integrates text
comprehension and image generation to ensure visual instructions are visually
appealing and maintain consistency and accuracy throughout the instruction
sequence. We validate the effectiveness by testing multi-step instructions and
comparing the text alignment and consistency with several baselines. Our
experiments show that our approach can visualize coherent and visually pleasing
instructions
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要