Motion Inversion for Video Customization
arxiv(2024)
摘要
In this research, we present a novel approach to motion customization in
video generation, addressing the widespread gap in the thorough exploration of
motion representation within video generative models. Recognizing the unique
challenges posed by video's spatiotemporal nature, our method introduces Motion
Embeddings, a set of explicit, temporally coherent one-dimensional embeddings
derived from a given video. These embeddings are designed to integrate
seamlessly with the temporal transformer modules of video diffusion models,
modulating self-attention computations across frames without compromising
spatial integrity. Our approach offers a compact and efficient solution to
motion representation and enables complex manipulations of motion
characteristics through vector arithmetic in the embedding space. Furthermore,
we identify the Temporal Discrepancy in video generative models, which refers
to variations in how different motion modules process temporal relationships
between frames. We leverage this understanding to optimize the integration of
our motion embeddings. Our contributions include the introduction of a tailored
motion embedding for customization tasks, insights into the temporal processing
differences in video models, and a demonstration of the practical advantages
and effectiveness of our method through extensive experiments.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要