Multi-attention mechanism for Chinese description of videos.

Hu Liu, Junxiu Wu,Jiabin Yuan

CSAI(2020)

引用 0|浏览4
暂无评分
摘要
Using natural language to describe videos is a hot topic in the field of natural language processing and computer vision. However, most of the video description tasks are to generate English descriptions now, rarely to generate Chinese descriptions. This paper explores the process of generating Chinese descriptions for videos. An improved model of video description is proposed in this paper, which combines multi-modal features and multi-attention mechanism. The model extracts video information from global features and fine-grained features and uses the multi-attention mechanism to focus more important video information in the decoding stage, which can further improve the richness and accuracy of the generated descriptions. The model is applied to the extended Chinese corpus of MSVD (Microsoft Research video description corpus), and the highest METEOR value obtained is still 9.6% higher than the best result of video Chinese description on MSVD found at present. The model also achieves an advanced result compared with many state-of-the-art methods in English environment.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要