Analyzing The Robustness of Unsupervised Speech Recognition

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 15|浏览32
暂无评分
摘要
Unsupervised speech recognition (unsupervised ASR) aims to learn ASR with non-parallel speech and text corpus only. Wav2cec-U has shown promising results in unsupervised ASR by self-supervised speech representations coupled with Generative Adversarial Network (GAN) training, but the robustness of the unsupervised ASR framework is unknown. In this work, we further analyze the robustness of unsupervised ASR on the mismatch scenarios in which the domains of unpaired speech and text are different. Three domain mismatch scenarios include: (1) using speech and text from different datasets, (2) using noisy/spontaneous speech, and (3) adjusting the amount of speech and text. We also quantify the degree of the domain mismatch by calculating the JS-divergence of phoneme n-gram between the transcription of speech and text. This metric correlates with the performance highly. Experimental results show that domain mismatch leads to inferior performance, but a self-supervised model pre-trained on the targeted speech domain can extract better representation to alleviate the performance drop.
更多
查看译文
关键词
Unsupervised ASR,Generative Adversarial Network,Robustness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要