Phonetically Induced Subwords for End-to-End Speech Recognition.

Interspeech(2021)

引用 3|浏览12
暂无评分
摘要
End-to-end automatic speech recognition systems map a sequence of acoustic features to text. In modern systems, text is encoded to grapheme subwords which are generated by methods designed for text processing tasks and therefore don't model or take advantage of the statistics of the acoustic features. Here, we present a novel method for generating grapheme subwords that are derived from phoneme sequences, therefore capturing phonetical statistics. The phonetically induced subwords can be used for training and inference in any system that benefits from subwords, regardless of architecture and without the need of a pronunciation lexicon. We compare our method to other commonly used methods, which are based on text statistics or on text-phoneme correspondence and present experiments on CTC and RNN-T architectures, evaluating subword sets of different sizes. We find that our phonetically induced subwords can improve performance of RNN-T models with relative improvements of up to 15.21% compared to other subword methods.
更多
查看译文
关键词
word piece,RNN-T,CTC,sequence-to-sequence,phoneme
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要