Speech Emotion Recognition With Acoustic And Lexical Features
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2015)
摘要
In this paper we explore one of the key aspects in building an emotion recognition system: generating suitable feature representations. We generate feature representations from both acoustic and lexical levels. At the acoustic level, we first extract low-level features such as intensity, F0, jitter, shimmer and spectral contours etc. We then generate different acoustic feature representations based on these low-level features, including statistics over these features, a new representation derived from a set of low-level acoustic codewords, and a new representation from Gaussian Supervectors. At the lexical level, we propose a new feature representation named emotion vector (eVector). We also use the traditional Bag-of-Words (BoW) feature. We apply these feature representations for emotion recognition and compare their performance on the USC-IEMOCAP database. We also combine these different feature representations via early fusion and late fusion. Our experimental results show that late fusion of both acoustic and lexical features achieves four-class emotion recognition accuracy of 69.2%.
更多查看译文
关键词
Emotion recognition,Acoustic features,Emotion lexicon,Lexical features,Support vector machine
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络