A Multimodal Framework for State of Mind Assessment with Sentiment Pre-classification

Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop(2019)

引用 1|浏览37
暂无评分
摘要
In this paper, we aim at the AVEC2019 State of Mind Sub-Challenge (SoMS), and propose a multimodal state of mind assessment framework, for valence and arousal, respectively. For valence, sentiment analysis is firstly performed on the English text obtained via German speech recognition and translation to classify the audio visual session into positive/negative narrative. Then each overlapping 60s segment of the session is input into an audio visual SoM assessment model trained for positive/negative narratives. The mean prediction of all the segments is adopted as the final prediction of the audio visual session. For arousal, the first step of positive/negative classification is not performed. For the audio-visual SoM assessment models, we propose to extract the functional features (Function) and VGGish based deep learning features (VGGish) from speech, and the abstract visual features based on convolutional neural network (CNN) from the baseline visual features. For each feature stream, a long short term memory (LSTM) model is trained to predict the valence/arousal values of a segment, and a support vector regression (SVR) model is adopted for the final decision fusion. Experiments on the USoM dataset show that the model with Function, baseline ResNet features and baseline VGG features obtains promising prediction results for valence, with concordance correlation coefficient (CCC) up to 0.531 on the test set, which is much higher than the baseline result 0.219.
更多
查看译文
关键词
multimodal SoM assessment model, sentiment analysis, state of mind (SoM), valence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要