Analysis Of Dnn Approaches To Speaker Identification

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 128|浏览126
暂无评分
摘要
This work studies the usage of the Deep Neural Network (DNN) Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition. We decouple the sufficient statistics extraction by using separate GMM models for frame alignment, and for statistics normalization and we analyze the usage of BN and MFCC features (and their concatenation) in the two stages. We also show the effect of using full-covariance GMM models, and, as a contrast, we compare the result to the recent DNN-alignment approach. On the NIST SRE2010, telephone condition, we show 60% relative gain over the traditional MFCC baseline for EER (and similar for the NIST DCF metrics), resulting in 0.94% EER.
更多
查看译文
关键词
automatic speaker identification,deep neural networks,bottleneck features,i-vector
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要