Domain and Language Adaptation Using Heterogeneous Datasets for Wav2vec2.0-Based Speech Recognition of Low-Resource Language

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 1|浏览20
We address the effective finetuning of a large-scale pretrained model for automatic speech recognition (ASR) of lowresource languages with only a one-hour matched dataset. The finetuning is composed of domain adaptation and language adaptation, and they are conducted by using heterogeneous datasets, which are matched with either domain or language. For effective adaptation, we incorporate auxiliary tasks of domain identification and language identification with multi-task learning. Moreover, the embedding result of the auxiliary tasks is fused to the encoder output of the pretrained model for ASR. Experimental evaluations on the Khmer ASR using the corpus of ECCC (the Extraordinary Chambers in the Courts of Cambodia) demonstrate that first conducting domain adaption and then language adaption is effective. In addition, multi-tasking with domain identification and fusing the domain ID embedding gives the best performance, which is a CER improvement of 6.47% absolute from the baseline finetuning method.
Speech recognition,low-resource language,domain adaptation,language adaptation,Khmer language,self-supervised pretraining
AI 理解论文
Chat Paper