CCTG-NET: Contextualized Convolutional Transformer-GRU Network for speech emotion recognition

International Journal of Speech Technology(2023)

引用 0|浏览4
暂无评分
摘要
Speech is a crucial aspect of human-to-human interactions and plays a fundamental role in the advancement of human–computer interaction (HCI) systems. Developing an accurate speech emotion recognition (SER) system for human conversations poses a critical yet challenging task. Existing state-of-the-art (SOTA) research in SER primarily focuses on modeling vocal information within individual conversational speech utterances, overlooking the significance of incorporating transactional information from the interaction context. In this paper, we present a novel Contextualized Convolutional Transformer-GRU Network (CCTG-Net) for recognizing speech emotions using Mel-spectrogram features, effectively integrating contextual information for emotion recognition. Our experiments are conducted on the widely-used emotional benchmark dataset, IEMOCAP. Compared to SOTA methods in four-class emotion recognition, our proposed model achieves a weighted accuracy of 88.4% and an unweighted accuracy (UA) of 89.1%. This marks a substantial 3.0% enhancement in UA while maintaining an optimal balance between performance and complexity.
更多
查看译文
关键词
Speech emotion recognition,Conversational speech,Mel-spectrogram,Convolutional neural network,Transformer,Gated recurrent unit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要