Noise Robust Speech Recognition Using Recent Developments In Neural Networks For Computer Vision

Takuya Yoshioka,Katsunori Ohnishi,Fuming Fang, Toniohiro Nakatani

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2016）

引用 26|浏览32

暂无评分

摘要

Convolutional Neural Networks (CNNs) are superior to fully connected neural networks in various speech recognition tasks and the advantage is pronounced in noisy environments. In recent years, many techniques have been proposed in the computer vision community to improve CNN's classification performance. This paper considers two approaches recently developed for image classification and examines their impacts on noisy speech recognition performance. The first approach is to increase the depth of convolution layers. Different approaches to deepening the CNNs are compared. In particular, the usefulness of learning dynamic features with small convolution layers that perform convolution in time is shown along with a modulation frequency analysis of the learned convolution filters. The second approach is to use trainable activation functions. Specifically, the use of a Parametric Rectified Linear Unit (PReLU) is investigated. Experimental results show that both approaches yield significant improvements in performance. Combining the two approaches further reduces recognition errors, producing a word error rate of 11.1% in the Aurora4 task, the best published result for this corpus, with a standard one-pass bi-gram decoding set-up.

查看译文

关键词

Automatic speech recognition,noise robustness,convolutional neural network,parametric rectified linear unit

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要