Joint Distribution Learning in the Framework of Variational Autoencoders for Far-Field Speech Enhancement

Mahesh K. Chelimilla,Shashi Kumar,Shakti P. Rath

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2019)

引用 3|浏览0
暂无评分
摘要
Far-field speech recognition is a challenging task as speech recognizers trained on close-talk speech do not generalize well to far-field speech. In order to handle such issues, neural network based speech enhancement is typically applied using denoising autoencoder (DA). Recently generative models have become more popular particularly in the field of image generation and translation. One of the popular techniques in this generative framework is variational autoencoder (VAE). In this paper we consider VAE for speech enhancement task in the context of automatic speech recognition (ASR). We propose a novel modification in the conventional VAE to model joint distribution of the far-field and close-talk features for a common latent space representation, which we refer to as joint-VAE. Unlike conventional VAE, joint-VAE involves one encoder network that projects the far-field features onto a latent space and two decoder networks that generate close-talk and far-field features separately. Experiments conducted on the AMI corpus show that it gives a relative WER improvement of 9% compared to conventional DA and a relative improvement of 19.2% compared to mismatched train and test scenario.
更多
查看译文
关键词
Variational autoencoders,speech enhancement,far-field speech,close-talking speech
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要