Emotion embedding framework with emotional self-attention mechanism for speaker recognition

Dongdong Li, Zhuo Yang, Jinlin Liu,Hai Yang,Zhe Wang

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览20
暂无评分
摘要
The emotional states of speech have a great impact on the efficiency of speaker recognition (SR) system. Many researchers focus on how to map speech with different emotions to an emotion invariant embedding, which reduces the diversity of data. This paper proposes a new emotion embedding framework with self attention mechanism for speaker recognition. First, several deep neural networks (DNNs) are trained to classify speakers in different emotional states as emotion embedding extractors during development phase. Then at enrollment stage, these pre-trained models are used to extend emotion embeddings from neutral speech. In order to make the final speaker embedding more representative, the classification model is trained with self attention mechanism in emotion dimension, so that the framework can automatically annotate the weights of the emotion embeddings. Experiments were carried out on both Mandarin Affective Speech Corpus (MASC) and Crowd-Sourced Emotional Multimodal Actors Dataset (CREMA-D). The results show the proposed method achieves the best of Identification Rate (IR) and Equal Error Rate (EER) which are 59.14%, 15.79% on MASC and 75.98%, 8.14% on CREMA-D compared with state-of-the-art methods. In addition, the cross-database experiments also further demonstrate the practicability of the method in real scenes.
更多
查看译文
关键词
Speaker recognition,Emotional states,Emotion embedding,Emotional self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要