Leveraging cache to enable SLU on tiny devices
CoRR(2023)
摘要
This paper addresses spoken language understanding (SLU) on
microcontroller-like embedded devices, integrating on-device execution with
cloud offloading in a novel fashion. We exploit temporal locality in a device's
speech inputs and accordingly reuse recent SLU inferences. Our idea is simple:
let the device match new inputs against cached results, and only offload
unmatched inputs to the cloud for full inference. Realization of this idea,
however, is non-trivial: the device needs to compare acoustic features in a
robust, low-cost way. To this end, we present XYZ, a speech cache for tiny
devices. It matches speech inputs at two levels of representations: first by
clustered sequences of raw sound units, then as sequences of phonemes. Working
in tandem, the two representations offer complementary cost/accuracy tradeoffs.
To further boost accuracy, our cache is learning: with the mismatched and then
offloaded inputs, it continuously finetunes the device's feature extractors
(with the assistance of the cloud). We implement XYZ on an off-the-shelf STM32
microcontroller. The resultant implementation has a small memory footprint of
2MB. Evaluated on challenging speech benchmarks, our system resolves 45%--90%
of inputs on device, reducing the average latency by up to 80% compared to
offloading to popular cloud speech services. Our benefit is pronounced even in
adversarial settings -- noisy environments, cold cache, or one device shared by
a number of users.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要