The More You See in 2D, the More You Perceive in 3D
CVPR 2024(2024)
摘要
Humans can infer 3D structure from 2D images of an object based on past
experience and improve their 3D understanding as they see more images. Inspired
by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel
view synthesis from an arbitrary number of unposed images. Given a few unposed
images of an object, we adapt a pre-trained view-conditioned diffusion model
together with the camera poses of the images via test-time fine-tuning. The
adapted diffusion model and the obtained camera poses are then utilized as
instance-specific priors for 3D reconstruction and novel view synthesis. We
show that as the number of input images increases, the performance of our
approach improves, bridging the gap between optimization-based prior-less 3D
reconstruction methods and single-image-to-3D diffusion-based methods. We
demonstrate our system on real images as well as standard synthetic benchmarks.
Our ablation studies confirm that this adaption behavior is key for more
accurate 3D understanding.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要