Many-to-many Image Generation with Auto-regressive Diffusion Models
arxiv(2024)
摘要
Recent advancements in image generation have made significant progress, yet
existing models present limitations in perceiving and generating an arbitrary
number of interrelated images within a broad context. This limitation becomes
increasingly critical as the demand for multi-image scenarios, such as
multi-view images and visual narratives, grows with the expansion of multimedia
platforms. This paper introduces a domain-general framework for many-to-many
image generation, capable of producing interrelated image series from a given
set of images, offering a scalable solution that obviates the need for
task-specific solutions across different multi-image scenarios. To facilitate
this, we present MIS, a novel large-scale multi-image dataset, containing 12M
synthetic multi-image samples, each with 25 interconnected images. Utilizing
Stable Diffusion with varied latent noises, our method produces a set of
interconnected images from a single caption. Leveraging MIS, we learn M2M, an
autoregressive model for many-to-many generation, where each image is modeled
within a diffusion framework. Throughout training on the synthetic MIS, the
model excels in capturing style and content from preceding images - synthetic
or real - and generates novel images following the captured patterns.
Furthermore, through task-specific fine-tuning, our model demonstrates its
adaptability to various multi-image generation tasks, including Novel View
Synthesis and Visual Procedure Generation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要