Analyzing And Improving Neural Speaker Embeddings for ASR

Speech Communication; 15th ITG Conference(2023)

引用 0|浏览3
暂无评分
摘要
Neural speaker embeddings encode the speaker’s speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, only a few inconclusive studies have investigated the usage of neural speaker embeddings for an ASR system. In this work, we present our efforts w.r.t integrating neural speaker embeddings into a Conformer-based hybrid HMM ASR system. For ASR, our improved embedding extraction pipeline in combination with the Weighted-Simple-Add integration method results in x-vector and c-vector reaching on par performance with i-vectors. We further analyze, compare and combine different speaker embeddings. We improve our already strong baseline by switching to one cycle learning schedule while reducing the training time. By further adding neural speaker embeddings, we gain additional improvements. This results in our best Conformer-based hybrid ASR system with speaker embeddings achieving 9.0% WER on Hub5’00 and Hub5’01 while only training on SWB 300h.
更多
查看译文
关键词
neural speaker embeddings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要