Distilhubert: Speech Representation Learning by Layer-Wise Distillation of Hidden-Unit Bert

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 100|浏览36
暂无评分
摘要
Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, this paper introduces DistilHuBERT, a novel multi-task learning framework to distill hidden representations from a HuBERT model directly. This method reduces HuBERT's size by 75% and 73% faster while retaining most performance in ten different tasks. Moreover, DistilHuBERT required little training time and data, opening the possibilities of pre-training personal and on-device SSL models for speech.
更多
查看译文
关键词
Self-supervised learning,speech representation learning,knowledge distillation,model compression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要