Developing State-of-the-Art End-to-End ASR for Norwegian.

TSD(2023)

引用 0|浏览1
暂无评分
摘要
We present the process of developing a modern end-to-end (E2E) automatic speech recognition (ASR) system for Norwegian (NO), which is a challenging language with many dialects and two written standards (Bokmål and Nynorsk). Since the existing speech corpora for this language are severely limited, we have had to acquire large amounts of additional data. This acquisition has been done by automatic processing of publicly accessible broadcast and parliament archives, YouTube and podcast channels, and also audiobooks. The data-harvesting process has been controlled by the ASR system, whose model has continuously been updated on the extracted chunks of speech. The final model has been trained on 1,246 h of Norwegian and further enhanced by transfer learning from an existing Swedish model. The performance of the ASR system has been evaluated on an 18-h collection of test sets (most of them publicly available) representing different application areas. Our best word error rate (WER) achieved on this collection is 7.6%, which is better than the results obtained from Google and Microsoft cloud services.
更多
查看译文
关键词
norwegian,state-of-the-art,end-to-end
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要