C-GCN: Correlation Based Graph Convolutional Network for Audio-Video Emotion Recognition

Weizhi Nie,Minjie Ren,Jie Nie,Sicheng Zhao

IEEE TRANSACTIONS ON MULTIMEDIA（2021）

Cited 33|Views125

No score

Abstract

With the development of both hardware and deep neural network technologies, tremendous improvements have been achieved in the performance of automatic emotion recognition (AER) based on the video data. However, AER is still a challenging task due to subtle expression, abstract concept of emotion and the representation of multi-modal information. Most proposed approaches focus on the multi-modal feature learning and fusion strategy, which pay more attention to the characteristic of a single video and ignore the correlation among the videos. To explore this correlation, in this paper, we propose a novel correlation-based graph convolutional network (C-GCN) for AER, which can comprehensively consider the correlation of the intra-class and inter-class videos for feature learning and information fusion. More specifically, we introduce the graph model to represent the correlation among the videos. This correlated information can help to improve the discrimination of node features in the progress of graph convolutional network. Meanwhile, the multi-head attention mechanism is applied to predict the hidden relationship among the videos, which can strengthen the inter-class correlation to improve the performance of classifiers. The C-GCN is evaluated on the AFEW datasets and eNTERFACE 05 dataset. The final experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.

Translated text

Key words

Emotion recognition, Feature extraction, Correlation, Task analysis, Visualization, Face recognition, Convolution, Emotion recognition, multi-head attention, multiple graphs

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined