LVCSR with Transformer Language Models.

INTERSPEECH(2020)

引用 12|浏览368
暂无评分
摘要
Neural network language models (LMs) based on self-attention have recently outperformed the previous state of the art, LSTM LMs. Transformer LMs today are often used as a post-processing step in lattice or n-best list rescoring. In this work the main focus is on using them in one-pass recognition. We show that by a simple reduction of redundant computations in batched self-attention we can obtain a 15% reduction in overall RTF on a well-tuned system. We also show that through proper initialization the layer normalization inside the residual blocks can be removed, yielding a further increase in forwarding speed. This is done under the constraint of staying close to state-of-the-art in terms of word-error rate (5.4% on LibriSpeech test-other) and achieving a real-time factor of around 1. Last but not least we also present an approach to speed up classic push-forward rescoring by mixing it with n-best list rescoring to better utilize the inherent parallelizability of Transformer language models, cutting the time needed for rescoring in half.
更多
查看译文
关键词
speech recognition, decoding, Transformer language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要