Emotion Expression With Fact Transfer for Video Description

IEEE Transactions on Multimedia(2022)

引用 8|浏览10
暂无评分
摘要
Translating a video into natural language is a fundamental but challenging task in visual understanding, since there is a great gap between visual content and linguistic sentence. More attention has been paid to this research field and a number of state-of-the-art results are achieved in recent years. However, the emotions in videos are usually overlooked, leading to the generated description sentences being boring and colorless. In this work, we construct a new dataset for video description with emotion expression, which consists of two parts: a re-annotated subset of the MSVD dataset with emotion embedded and another subset annotated with long sentences and rich emotions based on a video emotion recognition dataset. A fact transfer based framework is designed, which incorporates a fact stream and an emotion stream to generate sentences with emotion expression for video description. In addition, we propose a novel approach for sentence evaluation by balancing facts and emotions. A group of experiments are conducted, and the experimental results demonstrate the effectiveness of the proposed methods, including the idea of dataset construction for video description with emotion expression, model training and testing, and the emotion evaluation metric. The project page (including the code and dataset) can be found in https://mic.tongji.edu.cn/ce/70/c9778a183920/page.htm .
更多
查看译文
关键词
Convolutional neural network,emotion,fact transfer,long short-term memory,video description
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要