A Lightweight End-to-End Speech Recognition System on Embedded Devices.

IEICE Trans. Inf. Syst.(2023)

引用 0|浏览21
暂无评分
摘要
In industry, automatic speech recognition has come to be a competitive feature for embedded products with poor hardware resources. In this work, we propose a tiny end-to-end speech recognition model that is lightweight and easily deployable on edge platforms. First, instead of sophisticated network structures, such as recurrent neural networks, trans-formers, etc., the model we propose mainly uses convolutional neural net-works as its backbone. This ensures that our model is supported by most software development kits for embedded devices. Second, we adopt the basic unit of MobileNet-v3, which performs well in computer vision tasks, and integrate the features of the hidden layer at different scales, thus com-pressing the number of parameters of the model to less than 1 M and achiev-ing an accuracy greater than that of some traditional models. Third, in order to further reduce the CPU computation, we directly extract acoustic repre-sentations from 1-dimensional speech waveforms and use a self-supervised learning approach to encourage the convergence of the model. Finally, to solve some problems where hardware resources are relatively weak, we use a prefix beam search decoder to dynamically extend the search path with an optimized pruning strategy and an additional initialism language model to capture the probability of between-words in advance and thus avoid prema-ture pruning of correct words. In our experiments, according to a number of evaluation categories, our end-to-end model outperformed several tiny speech recognition models used for embedded devices in related work.
更多
查看译文
关键词
key automatic speech recognition, embedded and edge devices, end-to-end, prefix beam search, self-supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要