Efficient Cascaded Streaming ASR System Via Frame Rate Reduction.

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2023)

引用 0|浏览6
暂无评分
摘要
In this paper, we explore various frame rate reduction schemes on the two-pass cascaded encoder model to improve its efficiency without scarifying the transcription quality. We conduct extensive studies on frame rate reduction strategies, left and right context window length, trade-offs in quality, latency, computation and power consumption, and performance in short-and long-form datasets. With the proposed schemes, we can lower the 2nd pass frame rate to $120 \mathrm{~ms}$, half of the 1st pass’s. This achieves $20 \%$ RTF reduction / $13 \%$ power saving / $19 \%$ lower final latency, without impact on the word-error-rate nor partial results’ latency. If allowing partial latency increase, we can further reduce the frame rate to $180 \mathrm{~ms}$ or even $240 \mathrm{~ms}$ from the 1st pass, and obtain $45 \%$ RTF / 35% power savings, with a similar or even better (on the short-form testset) recognition accuracy.
更多
查看译文
关键词
On-device ASR,cascaded streaming model,frame rate reduction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要