Evaluation of Text-to-Video Generation Models: A Dynamics Perspective
arxiv(2024)
摘要
Comprehensive and constructive evaluation protocols play an important role in
the development of sophisticated text-to-video (T2V) generation models.
Existing evaluation protocols primarily focus on temporal consistency and
content continuity, yet largely ignore the dynamics of video content. Dynamics
are an essential dimension for measuring the visual vividness and the honesty
of video content to text prompts. In this study, we propose an effective
evaluation protocol, termed DEVIL, which centers on the dynamics dimension to
evaluate T2V models. For this purpose, we establish a new benchmark comprising
text prompts that fully reflect multiple dynamics grades, and define a set of
dynamics scores corresponding to various temporal granularities to
comprehensively evaluate the dynamics of each generated video. Based on the new
benchmark and the dynamics scores, we assess T2V models with the design of
three metrics: dynamics range, dynamics controllability, and dynamics-based
quality. Experiments show that DEVIL achieves a Pearson correlation exceeding
90
models. Code is available at https://github.com/MingXiangL/DEVIL.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要