Less Is More: A Unified Architecture for Device-Directed Speech Detection with Multiple Invocation Types

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览10
暂无评分
摘要
Suppressing unintended invocation of the device because of the speech that sounds like wake-word, or accidental button presses, is critical for a good user experience, and is referred to as False-Trigger-Mitigation (FTM). In case of multiple invocation options, the traditional approach to FTM is to use invocation-specific models, or a single model for all invocations. Both approaches are sub-optimal: the memory cost for the former approach grows linearly with the number of invocation options, which is prohibitive for on-device deployment, and does not take advantage of shared training data; while the latter is unable to accurately capture acoustic differences across different invocation types. To this end, we propose a Unified Acoustic Detector (UAD) for FTM when multiple invocation options are available on device. The proposed UAD is trained using a multi-task learning framework, where a jointly trained acoustic encoder model is augmented with invocation-specific classification layers. In the context of the FTM task, we show for the first time that using the shared model architecture across invocations (thus, keeping the model size similar to that of a monolithic model used for a single invocation type), we can not only match but largely improve the accuracy of the invocation-specific models. In particular, in the challenging case of touch-based invocation, we obtain 50% and 35% relative improvement in false positive rate at 99% true positive rate, when compared with a singleoutput model for both invocations, and separate models per invocation, respectively. Furthermore, we propose streaming and non-streaming variants of the UAD, and show that they both outperform a traditional ASR-based approach to FTM.
更多
查看译文
关键词
smart assistant,false trigger mitigation,intent classification,streaming,multi-task learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要