Multi-task Pre-training for Lhasa-Tibetan Speech Recognition

Yigang Liu,Yue Zhao,Xiaona Xu,Liang Xu, Xubei Zhang

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX（2023）

Cited 0|Views14

No score

Abstract

Compared to mainstream languages such as Chinese and English, Tibetan speech corpus is limited. Pre-training technology can improve the speech recognition performance for low-resource language by using multiple languages corpus, which involves initially training a neural network on the multi-language dataset, followed by fine-tuning the trained model on low-resource language. In this paper, a multi-task serial pre-training method is proposed to address the limited resources in Tibetan speech recognition. By designing the number and order of tasks in the pre-training process, better recognition performance can be achieved. The experiments on the Lhasa-Tibetan speech recognition task show that our proposed method is significantly superior to the baseline model, achieving a Tibetan word error rate of 4.12%, which is a 9.34% reduction compared to the baseline model and 1.06% lower compared to the existing pre-training model.

Translated text

Key words

Lhasa-Tibetan speech recognition,Multi-task,Serial Pre-training

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined