Chrome Extension
WeChat Mini Program
Use on ChatGLM

Leveraging Unlabeled Speech for Sequence Discriminative Training of Acoustic Models.

INTERSPEECH(2020)

Cited 2|Views3
No score
Abstract
State-of-the-art Acoustic Modeling (AM) techniques use long short term memory (LSTM) networks, and apply multiple phases of training on large amount of labeled acoustic data - initial cross-entropy (CE) training or connectionist temporal classification (CTC) training followed by sequence discriminative training, such as state-level Minimum Bayes Risk (sMBR). Recently, there is considerable interest in applying Semi-Supervised Learning (SSL) methods that leverage substantial amount of unlabeled speech for improving AM. This paper proposes a novel Teacher-Student based knowledge distillation (KD) approach for sequence discriminative training, where reference state sequence of unlabeled data are estimated using a strong Bi-directional LSTM Teacher model which is then used to guide the sMBR training of a LSTM Student model. We build a strong supervised LSTM AM baseline by using 45000 hours of labeled multi-dialect English data for initial CE or CTC training stage, and 11000 hours of its British English subset for sMBR training phase. To demonstrate the efficacy of the proposed approach, we leverage an additional 38000 hours of unlabeled British English data at only sMBR stage, which yields a relative Word Error Rate (WER) improvement in the range of 6% - 11% over supervised baselines in clean and noisy test conditions.
More
Translated text
Key words
Automatic Speech Recognition, Semi-Supervised Learning, Connectionist Temporal Classification, sMBR, Unlabeled Data
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined