Chrome Extension
WeChat Mini Program
Use on ChatGLM

Hybrid Syllable and Character Representations for Mandarin ASR

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)

Cited 0|Views9
No score
Abstract
With the development of deep learning, End-to-End (E2E) automatic speech recognition (ASR) based on Connectionist Temporal Classification (CTC) and attention has achieved great success and become the most popular method. In speech recognition, the selection of modeling units is critical. Most of the time, the modeling units of Mandarin are Chinese characters. However, the phenomenon of homophones and polyphonic characters in Chinese is very common, which degrades ASR performance. Pinyin can be regarded as the syllables of Chinese characters, which can reflect the pronunciation information of Chinese characters. In E2E ASR, due to the sequence-to-sequence form, Chinese characters directly correspond to the acoustic features and lack intermediate-level representations. In this paper, we introduce pinyin with tones as an auxiliary modeling unit to compensate for the mismatch between Chinese characters and acoustic features. On the basis of the hybrid modeling of syllables and Chinese characters, we propose a multi-task ASR model based on syllables and characters, which introduces a syllable CTC decoder and an attention decoder from syllables to Chinese characters to the joint CTC-attention model. Furthermore, a method of syllable auxiliary attention-rescoring method is proposed. Compared with the character-based ASR model, our method achieves a relative 8.6%/9.4% character error rate (CER) drop on Aishell-1 by greedy-search/attention-rescoring.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined