Remember and forget: video and text fusion for video question answering

Multimedia Tools Appl.(2018)

引用 3|浏览10
暂无评分
摘要
Video question answering (Video QA) has received much attention in recent years. It can answer questions according to the visual content of a video clip. Video QA task can be solved only according to the video data. But if the video clip has some relevant text information, It can also be solved by using the fused video and text data. How to select the useful region features from the video frames and select the useful text features from the text information needs to be solved. And how to fuse the video and text features also needs to be solved. Therefore, we propose a forget memory network to solve these problems. The forget memory network with video framework can solve Video QA task only according to the video data. It can select the useful region features for the question and forget the irrelevant region features from the video frames. The forget memory network with video and text framework can extract the useful text features and forget the irrelevant text features for the question. And it can fuse the video and text data to solve Video QA task. The fused video and text features can help improve the experimental performance.
更多
查看译文
关键词
Video QA,Forget memory network,Fused video and text features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要