谷歌浏览器插件
订阅小程序
在清言上使用

Scene-Edge GRU for Video Caption

PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020)(2020)

引用 8|浏览4
暂无评分
摘要
Recurrent neural networks for video caption have recently attracted widespread attention. It is essential for the video captioning task as it is involved in both the encoding phase and the text description generation phase of the video. However, the traditional encoding-decoding method ignores the scene switching in the video during the encoding phase. In this paper, we propose a video encoding scheme that can discover the structure of a video scene, so as to achieve variable length of the flexible encoding for the video. Unlike the classic encoding-decoding scheme, we propose a new GRU unit that recognizes discontinuities between video frames and enables end-to-end training without the need for additional annotation information. We evaluated our approach on two large datasets: the MPII movie description dataset, and the MSVD dataset. Experiments have shown that our method can find the appropriate level representation of the video and improve the best results of the movie description dataset.
更多
查看译文
关键词
Video understanding, video caption, video encoder, scene-edge GRU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要