Evaluation of a Speech Enhancement Method Combining Ensemble Time-Frequency Masking and Beamforming

Journal of the Robotics Society of Japan(2022)

Cited 0|Views0
No score
Abstract
With the development of deep learning, the recognition performance of automatic speech recognition has been greatly improved. On the other hand, there is still a problem of degradation of recognition accuracy due to an increase in the number of false positives of words and speech parts when environmental noise is severe. To solve this problem, many methods have been proposed to suppress the noise and to emphasize only the target speech, i.e., speech enhancement. In most cases, speech enhancement requires some assumptions to be made about the sound source. In addition, conventional speech enhancement methods do not fully utilize the key features in the input signal because they use a single model or network to enhance the speech. In this paper, we report a speech enhancement method based on beamforming using an ensemble time-frequency mask. The ensemble time-frequency mask is generated by estimating and integrating multiple time-frequency masks from multiple speech enhancement methods. The use of time-frequency masks estimated from multiple methods is expected to improve the robustness of the process. We evaluated the proposed method on the CHiME-3 dataset using PESQ and STOI, which are correlated with human auditory perception. In both evaluation metrics, the proposed method outperforms the one without ensemble, indicating the effectiveness of the proposed method. In addition, we conducted a validational experiment on the ensemble method of the proposed method.
More
Translated text
Key words
speech enhancement,time-frequency
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined