Boximator: Generating Rich and Controllable Motions for Video Synthesis
CoRR(2024)
摘要
Generating rich and controllable motion is a pivotal challenge in video
synthesis. We propose Boximator, a new approach for fine-grained motion
control. Boximator introduces two constraint types: hard box and soft box.
Users select objects in the conditional frame using hard boxes and then use
either type of boxes to roughly or rigorously define the object's position,
shape, or motion path in future frames. Boximator functions as a plug-in for
existing video diffusion models. Its training process preserves the base
model's knowledge by freezing the original weights and training only the
control module. To address training challenges, we introduce a novel
self-tracking technique that greatly simplifies the learning of box-object
correlations. Empirically, Boximator achieves state-of-the-art video quality
(FVD) scores, improving on two base models, and further enhanced after
incorporating box constraints. Its robust motion controllability is validated
by drastic increases in the bounding box alignment metric. Human evaluation
also shows that users favor Boximator generation results over the base model.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要