Optimizing multilingual knowledge transfer for time-delay neural networks with low-rank factorization

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)

引用 11|浏览0
暂无评分
摘要
When producing speech-to-text (STT) systems on a lower resource language, it is often beneficial to use knowledge obtained from a significantly larger multilingual dataset. We have seen benefits from using a multilingual TDNN as initialization for training an acoustic model on a target low resource language. In this work, we expand upon recent research that found benefits from applying sequential low-rank factorization (LRF) by extending it to a TDNN acoustic model trained on a large multilingual corpus. We also examine and optimize the knowledge transfer methodology, with the goal of avoiding the loss of useful information from the multilingual initialization during the knowledge transfer process. Our approach limits the updates to the multilingual network parameters during lattice-free maximum mutual information (LF-MMI) training on the target low resource language by fixing the multilingual network parameters and only optimizing the target output layer. The multilingual parameters and new output layer are jointly optimized using the state-level minimum Bayes risk (sMBR) objective function. By combining sequential LRF with this optimization method, we show across low resource target languages an average absolute WER reduction of 1.2%, yielding a better result than our previous best approach.
更多
查看译文
关键词
speech recognition,multilingual training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要