Speech Emotion Recognition Using Deep CNNs Trained on Log-Frequency Spectrograms

Mainak Biswas, Mridu Sahu, Maroi Agrebi,Pawan Kumar Singh,Youakim Badr

Studies in big data(2023)

引用 0|浏览0
暂无评分
摘要
Speech serves as the most important means of communication between humans. Every phrase a person speaks has certain emotions intertwined with it. Therefore, a natural desire would be to build a system that understands the mood and feelings of the speaker. Speech emotion detection may have a lot of real-life applications ranging from bettering recommendation systems (which adapt to the emotion user is experiencing) to monitoring people with chronic depression and suicidal tendencies. In this chapter, we propose a model for the recognition of emotions from speech data using log-frequency spectrograms and a deep convolutional neural network (CNN). We supplement our data with the noise of varied loudness obtained from various contexts with the aim of making our model resilient to noise. The augmented data is used for the extraction of spectrograms. These spectrogram images are used to train the deep CNNs, proposed in this paper. The model is independent of linguistic features, speaker-dependent features, the gender of speakers, and the intensity of the expressed emotion. This has been guaranteed by using the RAVDESS dataset, where the same sentences were spoken by 24 speakers (12 male and 12 female) with different expressions (in two levels of intensity). The model obtained an accuracy of 98.13% on this dataset. The experimental results show that our proposed model is quite capable of classifying emotions from human speech. The source code of the proposed model can be accessed using the following link: https://github.com/mainak-biswas1999/Spoken_Emotion_classification.git .
更多
查看译文
关键词
deep cnns,log-frequency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要