SERC-GCN: Speech Emotion Recognition In Conversation Using Graph Convolutional Networks
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)
摘要
Speech emotion recognition (SER) is the task of automatically recognizing emotions expressed in spoken language. Current approaches focus on analyzing isolated speech segments to identify a speaker’s emotional state. Meanwhile, recent text-based emotion recognition methods have effectively shifted towards emotion recognition in conversation (ERC) that considers conversational context. Motivated by this shift, here we propose SERC-GCN, a method for speech emotion recognition in conversation (SERC) that predicts a speaker’s emotional state by incorporating conversational context, speaker interactions, and temporal dependencies between utterances. SERC-GCN is a two-stage method. First, emotional features of utterance-level speech signals are extracted. Then, these features are used to form conversation graphs that are used to train a graph convolutional network to perform SERC. We empirically evaluate the effectiveness of SERC-GCN and show that it outperforms the current state-of-the-art methods on the IEMOCAP benchmark dataset.
更多查看译文
关键词
speech emotion recognition in conversation,human-computer interaction,graph convolutional network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要