SuperME: Supervised and Mixture-to-Mixture Co-Learning for Speech Enhancement and Robust ASR
arxiv(2024)
摘要
The current dominant approach for neural speech enhancement is based on
supervised learning by using simulated training data. The trained models,
however, often exhibit limited generalizability to real-recorded data. To
address this, we investigate training models directly on real target-domain
data, and propose two algorithms, mixture-to-mixture (M2M) training and a
co-learning algorithm that improves M2M with the help of supervised algorithms.
When paired close-talk and far-field mixtures are available for training, M2M
realizes speech enhancement by training a deep neural network (DNN) to produce
speech and noise estimates in a way such that they can be linearly filtered to
reconstruct the close-talk and far-field mixtures. This way, the DNN can be
trained directly on real mixtures, and can leverage close-talk mixtures as a
weak supervision to enhance far-field mixtures. To improve M2M, we combine it
with supervised approaches to co-train the DNN, where mini-batches of real
close-talk and far-field mixture pairs and mini-batches of simulated mixture
and clean speech pairs are alternately fed to the DNN, and the loss functions
are respectively (a) the mixture reconstruction loss on the real close-talk and
far-field mixtures and (b) the regular enhancement loss on the simulated clean
speech and noise. We find that, this way, the DNN can learn from real and
simulated data to achieve better generalization to real data. We name this
algorithm SuperME, supervised and
mixture-to-mixture co-learning. Evaluation results
on the CHiME-4 dataset show its effectiveness and potential.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要