Learning Salient Segments for Speech Emotion Recognition Using Attentive Temporal Pooling

IEEE ACCESS(2020)

引用 7|浏览14
暂无评分
摘要
In the temporal process of expressing the emotions, some intervals embed more salient emotion information than others. In this paper, by introducing an attentive temporal pooling module into the deep neural network (DNN) architecture, we present a simple but effective speech emotion recognition (SER) framework, which is able to automatically highlight the emotionally salient segments while suppressing the influence of less relevant ones. For an input speech utterance, the extracted feature sequence of hand-crafted low-level descriptors (LLDs) are evenly split into several overlapping temporal segments, and the segment-level features are computed by performing functionals on the LLDs of each segment. These segment-level features are then input into a DNN model outputting the emotion probabilities as well as the more condensed representation of each segment. An attentive temporal pooling module, consisting of an auxiliary DNN and a Gaussian Mixture Model (GMM), is proposed to learn the emotional saliency weights of different temporal segments from the condensed representations, which are then assigned to the segment-level emotion probabilities for the final utterance-level prediction. Notably, the attentive temporal pooling module and the DNN architecture for feature abstraction can be jointly trained using only the utterance-level labels, while without any frame-level or segment-level supervisory information. Experimental results on the three public released emotion datasets RML, EMO-DB, and IEMOCAP show that the proposed framework obtains state-of-the-art performance on SER.
更多
查看译文
关键词
Feature extraction,Hidden Markov models,Emotion recognition,Speech recognition,Computer architecture,Computational modeling,Support vector machines,Attentive temporal pooling,deep neural networks,hand-crafted audio features,speech emotion recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要