Caching Networks: Capitalizing on Common Speech for ASR

Anastasios Alexandridis,Grant P. Strimel,Ariya Rastrow,Pavel Kveton,Jon Webb,Maurizio Omologo,Siegfried Kunzmann,Athanasios Mouchtaris

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)（2022）

引用 2|浏览15

暂无评分

摘要

We introduce Caching Networks (CachingNets), a speech recognition network architecture capable of delivering faster, more accurate decoding by leveraging common speech patterns. By explicitly incorporating select sentences unique to each user into the network's design, we show how to train the model as an extension of the popular sequence transducer architecture through a multitask learning procedure. We further propose and experiment with different phrase caching policies, which are effective for virtual voice-assistant (VA) applications, to complement the architecture. Our results demonstrate that by pivoting between different inference strategies on the fly, CachingNets can deliver significant performance improvements. Specifically, on an industrial-scale, VA ASR task, we observe up to 7.4% relative word error rate (WER) and 11% sentence error rate (SER) improvements with accompanied latency gains.

查看译文

关键词

automatic speech recognition,latency,streaming,end-to-end,personalization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要