Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
CoRR(2023)
摘要
A short clip of video may contain progression of multiple events and an
interesting story line. A human need to capture both the event in every shot
and associate them together to understand the story behind it. In this work, we
present a new multi-shot video understanding benchmark Shot2Story20K with
detailed shot-level captions and comprehensive video summaries. To facilitate
better semantic understanding of videos, we provide captions for both visual
signals and human narrations. We design several distinct tasks including
single-shot video and narration captioning, multi-shot video summarization, and
video retrieval with shot descriptions. Preliminary experiments show some
challenges to generate a long and comprehensive video summary. Nevertheless,
the generated imperfect summaries can already significantly boost the
performance of existing video understanding tasks such as video
question-answering, promoting an under-explored setting of video understanding
with detailed summaries.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要