Reducing the GAP Between Streaming and Non-Streaming Transducer-Based ASR by Adaptive Two-Stage Knowledge Distillation

Haitao Tang,Yu Fu,Lei Sun,Jiabin Xue,Dan Liu, Yongchao Li, Zhiqiang Ma,Minghui Wu,Jia Pan,Genshun Wan, Ming’En Zhao

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览6
暂无评分
摘要
Transducer is one of the mainstream frameworks for streaming speech recognition. There is a performance gap between the streaming and non-streaming transducer models due to limited context. To reduce this gap, an effective way is to ensure that their hidden and output distributions are consistent, which can be achieved by hierarchical knowledge distillation. However, it is difficult to ensure the distribution consistency simultaneously because the learning of the output distribution depends on the hidden one. In this paper, we propose an adaptive two-stage knowledge distillation method consisting of hidden layer learning and output layer learning. In the former stage, we learn hidden representation with full context by applying mean square error loss function. In the latter stage, we design a power transformation based adaptive smoothness method to learn stable output distribution. It achieved 19% relative reduction in word error rate, and a faster response for the first token compared with the original streaming model in LibriSpeech corpus.
更多
查看译文
关键词
Speech Recognition,Conformer Transducer,Knowledge Distillation,Power Transformation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要