StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization
arxiv(2024)
摘要
Creating large-scale virtual urban scenes with variant styles is inherently
challenging. To facilitate prototypes of virtual production and bypass the need
for complex materials and lighting setups, we introduce the first
vision-and-text-driven texture stylization system for large-scale urban scenes,
StyleCity. Taking an image and text as references, StyleCity stylizes a 3D
textured mesh of a large-scale urban scene in a semantics-aware fashion and
generates a harmonic omnidirectional sky background. To achieve that, we
propose to stylize a neural texture field by transferring 2D vision-and-text
priors to 3D globally and locally. During 3D stylization, we progressively
scale the planned training views of the input 3D scene at different levels in
order to preserve high-quality scene content. We then optimize the scene style
globally by adapting the scale of the style image with the scale of the
training views. Moreover, we enhance local semantics consistency by the
semantics-aware style loss which is crucial for photo-realistic stylization.
Besides texture stylization, we further adopt a generative diffusion model to
synthesize a style-consistent omnidirectional sky image, which offers a more
immersive atmosphere and assists the semantic stylization process. The stylized
neural texture field can be baked into an arbitrary-resolution texture,
enabling seamless integration into conventional rendering pipelines and
significantly easing the virtual production prototyping process. Extensive
experiments demonstrate our stylized scenes' superiority in qualitative and
quantitative performance and user preferences.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要