RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks
CoRR(2024)
摘要
We present a novel unconditional video generative model designed to address
long-term spatial and temporal dependencies. To capture these dependencies, our
approach incorporates a hybrid explicit-implicit tri-plane representation
inspired by 3D-aware generative frameworks developed for three-dimensional
object representation and employs a singular latent code to model an entire
video sequence. Individual video frames are then synthesized from an
intermediate tri-plane representation, which itself is derived from the primary
latent code. This novel strategy reduces computational complexity by a factor
of 2 as measured in FLOPs. Consequently, our approach facilitates the
efficient and temporally coherent generation of videos. Moreover, our joint
frame modeling approach, in contrast to autoregressive methods, mitigates the
generation of visual artifacts. We further enhance the model's capabilities by
integrating an optical flow-based module within our Generative Adversarial
Network (GAN) based generator architecture, thereby compensating for the
constraints imposed by a smaller generator size. As a result, our model is
capable of synthesizing high-fidelity video clips at a resolution of
256×256 pixels, with durations extending to more than 5 seconds at a
frame rate of 30 fps. The efficacy and versatility of our approach are
empirically validated through qualitative and quantitative assessments across
three different datasets comprising both synthetic and real video clips.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要