Word Order does not Matter for Speech Recognition.

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 5|浏览130
暂无评分
摘要
In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known. We train a word-level acoustic model which aggregates the distribution of all output frames using LogSumExp operation and uses a cross-entropy loss to match with the ground-truth words distribution. Using the pseudo-labels generated from this model on the training set, we then train a letter-based acoustic model using Connectionist Temporal Classification loss. Our system achieves 2.4%/5.3% on test-clean/test-other subsets of LibriSpeech, which is competitive with the supervised baseline's performance.
更多
查看译文
关键词
word order,speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要