谷歌浏览器插件
订阅小程序
在清言上使用

Video Question Generation for Dynamic Changes

IEEE transactions on circuits and systems for video technology(2024)

引用 0|浏览8
暂无评分
摘要
Video question generation task aims to generate meaningful questions about a video targeting an answer. Existing methods merely focus on the static appearance features in the image frames or simply identify a motion in the video to ask general questions. However, a video contains dynamically changing visual content that deserves to be questioned, e.g., changes in object motions, object states and relationships among objects, which is more practical and closer to the dynamic world we live in. In this paper, we propose a difference-aware video question generation model that aims to generate questions about temporal differences in the video, i.e., capturing the dynamic changes between image frames of a video to ask questions. To capture the dynamic changes between image frames, we utilize a temporal difference extractor to localize the differences for each frame pair of a video through an attention mechanism. Then, we introduce an answer-aware module to capture the answer-related image frame pair containing their differences for question generation, which aims to guide our model to focus on answer-related content for questioning. Finally, the output of the answer-aware module is sent to a decoder module to generate questions. Extensive experiments on SVQA and MSVD-QA datasets show that the proposed model outperforms state-of-the-art models, e.g., our model achieves at least 17.1% improvement over existing models in the SVQA dataset. This is because our model can generate questions similar to ground truths that involve changes between image frames in videos. Our code is available at https://github.com/Gary-code/D-VQG.
更多
查看译文
关键词
Video question answering,video question generation,video temporal reasoning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要