谷歌浏览器插件
订阅小程序
在清言上使用

Towards Context-Aware End-to-End Code-Switching Speech Recognition.

INTERSPEECH(2020)

引用 6|浏览22
暂无评分
摘要
Code-switching (CS) speech recognition is drawing increasing attention in recent years as it is a common situation in speech where speakers alternate between languages in the context of a single utterance or discourse. In this work, we propose Hierarchical Attention-based Recurrent Decoder (HARD) to build a context-aware end-to-end code-switching speech recognition system. HARD is an attention-based decoder model which employs a hierarchical recurrent network to enhance model's awareness of previous generated historical sequence (sub-sequence) at decoding. This architecture has two LSTMs to model encoder hidden states at both the character level and sub-sequence level, therefore enables us to generate utterances that switch between languages more precisely from speech. We also employ language identification (LID) as an auxiliary task in multi-task learning (MTL) to boost speech recognition performance. We evaluate the effectiveness of our model on the SEAME dataset, results show that our multi-task learning HARD (MTL-HARD) model improves over the baseline Listen, Attend and Spell (LAS) model by reducing character error rate (CER) from 29.91% to 26.56% and mixed error rate (MER) from 38.99% to 34.50%, and case study shows MTL-HARD can carry historical information in the sub-sequences.
更多
查看译文
关键词
speech recognition, code-switching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要