REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
CoRR(2024)
摘要
Unsupervised automatic speech recognition (ASR) aims to learn the mapping
between the speech signal and its corresponding textual transcription without
the supervision of paired speech-text data. A word/phoneme in the speech signal
is represented by a segment of speech signal with variable length and unknown
boundary, and this segmental structure makes learning the mapping between
speech and text challenging, especially without paired data. In this paper, we
propose REBORN, Reinforcement-Learned Boundary Segmentation with Iterative
Training for Unsupervised ASR. REBORN alternates between (1) training a
segmentation model that predicts the boundaries of the segmental structures in
speech signals and (2) training the phoneme prediction model, whose input is a
segmental structure segmented by the segmentation model, to predict a phoneme
transcription. Since supervised data for training the segmentation model is not
available, we use reinforcement learning to train the segmentation model to
favor segmentations that yield phoneme sequence predictions with a lower
perplexity. We conduct extensive experiments and find that under the same
setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech,
TIMIT, and five non-English languages in Multilingual LibriSpeech. We
comprehensively analyze why the boundaries learned by REBORN improve the
unsupervised ASR performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要