Comp4D: LLM-Guided Compositional 4D Scene Generation
arxiv(2024)
摘要
Recent advancements in diffusion models for 2D and 3D content creation have
sparked a surge of interest in generating 4D content. However, the scarcity of
3D scene datasets constrains current methodologies to primarily object-centric
generation. To overcome this limitation, we present Comp4D, a novel framework
for Compositional 4D Generation. Unlike conventional methods that generate a
singular 4D representation of the entire scene, Comp4D innovatively constructs
each 4D object within the scene separately. Utilizing Large Language Models
(LLMs), the framework begins by decomposing an input text prompt into distinct
entities and maps out their trajectories. It then constructs the compositional
4D scene by accurately positioning these objects along their designated paths.
To refine the scene, our method employs a compositional score distillation
technique guided by the pre-defined trajectories, utilizing pre-trained
diffusion models across text-to-image, text-to-video, and text-to-3D domains.
Extensive experiments demonstrate our outstanding 4D content creation
capability compared to prior arts, showcasing superior visual quality, motion
fidelity, and enhanced object interactions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要