Investigation of Cost Function for Supervised Monaural Speech Separation

INTERSPEECH(2019)

Cited 4|Views43
No score
Abstract
Speech separation aims to improve the speech quality of noisy speech. Deep learning based speech separation methods usually use mean square error (MSE) as the cost function, which measures the distance between model output and training target. However, the MSE does not match the evaluation metrics perfectly. Optimizing the MSE does not directly lead to improvement in the commonly used metrics, such as short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), signal-to-noise ratio (SNR) and source-to-distortion ratio (SDR). In this study, we inspect some other cost function candidates which are based on divergence, e.g., Kullback-Leibler and Itakura-Saito divergence. A conjecture about the correlation between cost function and evaluation metrics is proposed and examined to explain why these cost functions behave differently. On the basis of the proposed conjecture, the optimal cost function candidate is selected. The experimental results validate our conjecture.
More
Translated text
Key words
divergence, deep neural networks, cost function, speech separation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined