Formant-Gaps Features For Speaker Verification Using Whispered Speech

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2019)

引用 12|浏览10
暂无评分
摘要
In this work, we propose a new feature based on formants for whispered speaker verification ( SV) task, where neutral data is used for enrollment and whispered recordings are used for test. Such a mismatch between enrollment and test often degrades the performance of whispered SV systems due to the difference in acoustic characteristics of whispered and neutral speech. We hypothesize that the proposed formant and formant gap ( FoG) features are more invariant to the modes of speech in capturing speaker specific information compared to traditional baseline features for SV including mel frequency cepstral coefficients ( MFCC) and auditory-inspired amplitude modulation features ( AAMF). Whispered SV experiments with 714 speakers comprising 29232 neutral and 22932 whispered recordings reveal that the equal error rate ( EER) using the proposed features is lower than that using the best baseline features by similar to 3.79% ( absolute). It was also observed that at least four whispered recordings during enrollment are required for the baseline features to perform at par with the proposed features. However, it was found that the best performing baseline features yield an EER for neutral SV task which is similar to 1.88% higher than that using the proposed features.
更多
查看译文
关键词
whispered speech, speaker verification, formants
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要