Librispeech: An Asr Corpus Based On Public Domain Audio Books

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2015)

引用 6009|浏览902
暂无评分
摘要
This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself We are also releasing Kaldi scripts that make it easy to build these systems.
更多
查看译文
关键词
Speech Recognition,Corpus,LibriVox
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要