Boosting keyword spotting through on-device learnable user speech characteristics
arxiv(2024)
摘要
Keyword spotting systems for always-on TinyML-constrained applications
require on-site tuning to boost the accuracy of offline trained classifiers
when deployed in unseen inference conditions. Adapting to the speech
peculiarities of target users requires many in-domain samples, often
unavailable in real-world scenarios. Furthermore, current on-device learning
techniques rely on computationally intensive and memory-hungry backbone update
schemes, unfit for always-on, battery-powered devices. In this work, we propose
a novel on-device learning architecture, composed of a pretrained backbone and
a user-aware embedding learning the user's speech characteristics. The
so-generated features are fused and used to classify the input utterance. For
domain shifts generated by unseen speakers, we measure error rate reductions of
up to 19
Speech Commands dataset, through the inexpensive update of the user
projections. We moreover demonstrate the few-shot learning capabilities of our
proposed architecture in sample- and class-scarce learning conditions. With
23.7 kparameters and 1 MFLOP per epoch required for on-device training, our
system is feasible for TinyML applications aimed at battery-powered
microcontrollers.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要