Strategies for training large scale neural network language models

Automatic Speech Recognition and Understanding(2011)

引用 703|浏览240
暂无评分
摘要
We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.
更多
查看译文
关键词
computational complexity,learning (artificial intelligence),neural nets,speech recognition,4-gram model,computational complexity,english broadcast news speech recognition task,hash-based implementation,large scale neural network language model training,maximum entropy model,training data,word error rate
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要