Improvements To Filterbank And Delta Learning Within A Deep Neural Network Framework

ICASSP(2014)

引用 13|浏览81
暂无评分
摘要
Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.
更多
查看译文
关键词
channel bank filters,learning (artificial intelligence),neural nets,speaker recognition,speech recognition,CNN,English Broadcast News task,VTLN warping,WER,deep neural network,delta learning,filter learning,perceptually motivated mel-filter bank,speaker adaptation,speaker adaptation techniques,speaker identity features,speech recognition,word error rate,
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要