Deep bi-directional recurrent networks over spectral windows

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)(2015)

引用 53|浏览169
暂无评分
摘要
Long short-term memory (LSTM) acoustic models have recently achieved state-of-the-art results on speech recognition tasks. As a type of recurrent neural network, LSTMs potentially have the ability to model long-span phenomena relating the spectral input to linguistic units. However, it has not been clear whether their observed performance is actually due to this capability, or instead if it is due to a better modeling of short term dynamics through the recurrence. In this paper. we answer this question by applying a windowed (truncated) LSTM to conversational speech transcription, and find that a limited context is adequate, and that it is not necessaary to scan the entire utterance. The sliding window approach allows not only incremental (online) recognition with a bidirectional model, but also frame-wise randomization (as opposed to utterance randomization), which results in faster convergence. On the SWBD/Fisher corpus, applying bidirectional LSTM RNNs to spectral windows of about 0.5s improves WER on the Hub5'00 benchmark set by 16% relative compared to our best sequence-trained DNN. On an extended 3850h training set that that also includes lectures, the relative gain becomes 28% (Hub5'00 WER 9.2%). In-house conversational data improves by 12 to 17% relative.
更多
查看译文
关键词
Recurrent networks,LSTM,Deep learning,acoustic modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要