Movie101v2: Improved Movie Narration Benchmark
CoRR(2024)
摘要
Automatic movie narration targets at creating video-aligned plot descriptions
to assist visually impaired audiences. It differs from standard video
captioning in that it requires not only describing key visual details but also
inferring the plots developed across multiple movie shots, thus posing unique
and ongoing challenges. To advance the development of automatic movie narrating
systems, we first revisit the limitations of existing datasets and develop a
large-scale, bilingual movie narration dataset, Movie101v2. Second, taking into
account the essential difficulties in achieving applicable movie narration, we
break the long-term goal into three progressive stages and tentatively focus on
the initial stages featuring understanding within individual clips. We also
introduce a new narration assessment to align with our staged task goals.
Third, using our new dataset, we baseline several leading large vision-language
models, including GPT-4V, and conduct in-depth investigations into the
challenges current models face for movie narration generation. Our findings
reveal that achieving applicable movie narration generation is a fascinating
goal that requires thorough research.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要