Multilingual Speech Recognition with Self-Attention Structured Parameterization.

INTERSPEECH(2020)

引用 21|浏览68
暂无评分
摘要
Multilingual automatic speech recognition systems can transcribe utterances from different languages. These systems are attractive from different perspectives: they can provide quality improvements, specially for lower resource languages, and simplify the training and deployment procedure. End-to-end speech recognition has further simplified multilingual modeling as one model, instead of several components of a classical system, have to be unified. In this paper, we investigate a streamable end-to-end multilingual system based on the Transformer Transducer [1]. We propose several techniques for adapting the self-attention architecture based on the language id. We analyze the trade-offs of each method with regards to quality gains and number of additional parameters introduced. We conduct experiments in a real-world task consisting of five languages. Our experimental results demonstrate similar to 8% to similar to 20% relative gain over the baseline multilingual model.
更多
查看译文
关键词
speech recognition, multilingual, RNN-T, Transformer Transducer, language id
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要