Improvements To Filterbank And Delta Learning Within A Deep Neural Network Framework

Tara N. Sainath,Brian Kingsbury,Abdel-Rahman Mohamed,George Saon,Bhuvana Ramabhadran

ICASSP（2014）

引用 13|浏览81

暂无评分

摘要

Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.

查看译文

关键词

channel bank filters,learning (artificial intelligence),neural nets,speaker recognition,speech recognition,CNN,English Broadcast News task,VTLN warping,WER,deep neural network,delta learning,filter learning,perceptually motivated mel-filter bank,speaker adaptation,speaker adaptation techniques,speaker identity features,speech recognition,word error rate,

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要