Exploring the Impact of Spatio-Temporal Patterns in Audio Spectrograms on Emotion Recognition

2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation (ICAMIMIA)(2023)

Cited 0|Views6
No score
Abstract
Speech emotion recognition plays a vital role in enhancing human-computer interaction and improving user experience in various applications. This paper investigates the utilization of spatio-temporal patterns in speech emotion recognition, contrasting them with conventional methods that rely solely on spatial or temporal information. The approach involves a parallel architecture, coupling Convolutional Neural Networks (CNNs) with Transformers as an encoder block network. This design combines the spatial feature extraction capabilities of CNNs with the temporal modeling strengths of Transformers, enabling the capture of intricate patterns and contextual relationships within speech data. We present a comprehensive experimental analysis conducted on three benchmark datasets, shedding light on the impact of the utilization of spatio-temporal patterns in advancing the field of speech emotion recognition.
More
Translated text
Key words
audio signal processing,speech emotion recognition,spatio-temporal pattern,automation,technology
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined