Emotion recognition from spontaneous speech using emotional vowel-like regions

Multimedia Tools and Applications(2022)

引用 0|浏览8
暂无评分
摘要
Spontaneous speech varies in terms of characteristics such as emotion, volume, and pitch. Emotion, itself, is not uniformly distributed across an utterance. The extraction of relevant portions from an utterance that contain meaningful information in terms of emotion is always challenging. The vowel like regions (VLRs) are known to contain emotion-specific information. However, for spontaneous speech, all the VLRs in an utterance do not contain emotion. This paper proposes a method for extracting the emotional VLRs from a set of vowels in an utterance based on the fundamental frequency of a VLR. Further, the recently proposed epoch synchronous single frequency cepstral coefficients (SFCCs) features are combined with the epoch-based features producing 1.33% better result than state-of-the-art technique. In general, the accuracy value reduces for long utterances because all the VLRs in a long utterance are not consistent with the ground truth label. However, the proposed approach produced an improvement in accuracy by 8.22% for the long utterances when emotional VLRs were used in place of all VLRs.
更多
查看译文
关键词
Emotional vowel, Single frequency cepstral coefficients (SFCC), Fundamental-frequency, Speech emotion recognition (SER), Long-utterance, Vowel-like-regions(VLRs)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要