Direct enhancement of pre-trained speech embeddings for speech processing in noisy conditions

Computer Speech & Language(2023)

引用 5|浏览9
暂无评分
摘要
Lately, the development of deep learning algorithms has marked milestones in the field of speech processing. In particular, the release of pre-trained feature extraction models has considerably simplified the development of speech classification and recognition algorithms. However, environmental noise and reverberation still negatively affect the whole performance, making robustness in noisy conditions mandatory in real-world applications. One way to mitigate the noise effect is to integrate a speech enhancement front-end that removes artifacts from the desired speech signals. Unlike the state-of-the-art enhancement approaches that operate either on speech spectrogram or directly on time-domain signals, in this paper, we study how enhancement can be applied directly on the speech embeddings, extracted using Wav2Vec and WavLM models. Moreover, we investigate a variety of training approaches, considering different flavors of joint and disjoint training of the speech enhancement front-end with the classification/recognition back-end. We perform exhaustive experiments on the Fluent Speech Commands and Google Speech Commands datasets contaminated with noises from the Microsoft Scalable Noisy Speech Dataset, as well as on the LibriSpeech dataset, contaminated with noises from the MUSAN dataset, considering intent classification, keyword spotting, and speech recognition tasks respectively. Results show that directly enhancing the speech embedding is a viable, computationally effective approach, and provide insights about the most promising training approaches.
更多
查看译文
关键词
Speech enhancement,Automatic speech recognition,Speech embedding,Speech classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要