PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores
CoRR(2024)
Abstract
Recent advancements in audio-visual generative modeling have been propelled
by progress in deep learning and the availability of data-rich benchmarks.
However, the growth is not attributed solely to models and benchmarks.
Universally accepted evaluation metrics also play an important role in
advancing the field. While there are many metrics available to evaluate audio
and visual content separately, there is a lack of metrics that offer a
quantitative and interpretable measure of audio-visual synchronization for
videos "in the wild". To address this gap, we first created a large scale human
annotated dataset (100+ hrs) representing nine types of synchronization errors
in audio-visual content and how human perceive them. We then developed a PEAVS
(Perceptual Evaluation of Audio-Visual Synchrony) score, a novel automatic
metric with a 5-point scale that evaluates the quality of audio-visual
synchronization. We validate PEAVS using a newly generated dataset, achieving a
Pearson correlation of 0.79 at the set level and 0.54 at the clip level when
compared to human labels. In our experiments, we observe a relative gain 50
over a natural extension of Fréchet based metrics for Audio-Visual synchrony,
confirming PEAVS efficacy in objectively modeling subjective perceptions of
audio-visual synchronization for videos "in the wild".
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined