Caching Networks: Capitalizing on Common Speech for ASR

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 2|浏览15
暂无评分
摘要
We introduce Caching Networks (CachingNets), a speech recognition network architecture capable of delivering faster, more accurate decoding by leveraging common speech patterns. By explicitly incorporating select sentences unique to each user into the network's design, we show how to train the model as an extension of the popular sequence transducer architecture through a multitask learning procedure. We further propose and experiment with different phrase caching policies, which are effective for virtual voice-assistant (VA) applications, to complement the architecture. Our results demonstrate that by pivoting between different inference strategies on the fly, CachingNets can deliver significant performance improvements. Specifically, on an industrial-scale, VA ASR task, we observe up to 7.4% relative word error rate (WER) and 11% sentence error rate (SER) improvements with accompanied latency gains.
更多
查看译文
关键词
automatic speech recognition,latency,streaming,end-to-end,personalization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要