Speech Emotion Recognition using Spectral Images and Convolutional Neural Network

2023 IEEE 20th India Council International Conference (INDICON)(2023)

Cited 0|Views0
No score
Abstract
Employing a computer for automatic speech-emotion identification is a formidable and intricate undertaking. Speech emotion recognition (SER) has gained significant popularity among academics for over three decades due to its wide range of applications in many industries, such as medical treatment, marketing, customer service, driving, internet searching, and education. Researchers used many approaches to enhance the efficiency of emotion categorization. In our work, we used the images of the mel frequency cepstral coefficient (MFCC), mel-spectrogram, and a combination of both as feature input to a 2D convolutional neural network (2D-CNN) classifier to classify the emotion. We trained the model with individuals and a combination of images of the proposed feature to classify the emotion. Based on the experimental results, we observed that the suggested feature combination MFCC and mel-spectrogram performed superior to the individual in terms of speech signal emotion recognition. To assess the efficacy of our features, we used three datasets: TESS, RAVDESS, and EMO-DB. For the EMO-DB, TESS, and RAVDESS datasets, we found that the accuracy of emotion categorization was 88.89%, 100%, and 81.2%, respectively.
More
Translated text
Key words
SER,MFCC,CNN,mel-Spectrogram
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined