A Chinese Speech Recognition System Based on Fusion Network Structure.

ICCT(2021)

引用 2|浏览2
暂无评分
摘要
The purpose of an automatic speech recognition system is to convert speech into recognizable text. Chinese is a language in which the same pronunciation but different writing means different meanings. At present, there are relatively few researches on Chinese speech recognition. Therefore, we propose a Chinese automatic speech recognition system based on the fusion network RRAINet and End-to-End structure acoustic model + language model. We treat the speech signal as a visual problem, and use the Mel spectrum and SpecAugment methods to preprocess the data. The model is trained by connected time series classification criteria and decoded based on a greedy algorithm, which can convert speech signals into Chinese characters. Experiments show that the model phoneme error rate is 12.56% and 12.38% on the dev set and the test set of Free ST(ST-CMDS-20170001_1-OS). The model word error rates are 18.79% and 18.74%, which are about 5% lower than the baseline VGG-CTC model.
更多
查看译文
关键词
speech recognition,data preprocessing,Fusion structure,CTC,Markov language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要