C-GCN: Correlation Based Graph Convolutional Network for Audio-Video Emotion Recognition

IEEE TRANSACTIONS ON MULTIMEDIA(2021)

Cited 33|Views125
No score
Abstract
With the development of both hardware and deep neural network technologies, tremendous improvements have been achieved in the performance of automatic emotion recognition (AER) based on the video data. However, AER is still a challenging task due to subtle expression, abstract concept of emotion and the representation of multi-modal information. Most proposed approaches focus on the multi-modal feature learning and fusion strategy, which pay more attention to the characteristic of a single video and ignore the correlation among the videos. To explore this correlation, in this paper, we propose a novel correlation-based graph convolutional network (C-GCN) for AER, which can comprehensively consider the correlation of the intra-class and inter-class videos for feature learning and information fusion. More specifically, we introduce the graph model to represent the correlation among the videos. This correlated information can help to improve the discrimination of node features in the progress of graph convolutional network. Meanwhile, the multi-head attention mechanism is applied to predict the hidden relationship among the videos, which can strengthen the inter-class correlation to improve the performance of classifiers. The C-GCN is evaluated on the AFEW datasets and eNTERFACE 05 dataset. The final experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.
More
Translated text
Key words
Emotion recognition, Feature extraction, Correlation, Task analysis, Visualization, Face recognition, Convolution, Emotion recognition, multi-head attention, multiple graphs
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined