Harmonic feature fusion for robust neural network-based acoustic modeling

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2017)

引用 2|浏览0
暂无评分
摘要
Acoustic modeling with deep learning has drastically improved the performance of automatic speech recognition (ASR) where the main stream of the acoustic feature is still log-Mel filtered one. While the log-Mel filtered features lose harmonic-structure information, they still include useful information for ASR. Several attempts have been made to integrate higher-resolution information into the network. In order to improve the ASR accuracy in noisy conditions, we propose new features integrated into acoustic modeling to represent which parts in the time-frequency domain have a distinct harmonic structure, since it is partially observed in noisy environments. The new features are combined with the standard acoustic features, and the network is trained with them using various noisy data. Through these operations, it learns the acoustic features with a kind of quality tag describing which parts are clean or degraded. Our model reduced the word error rate in an Aurora-4 task by 10.3% in DNN compared with the strong baseline while retaining the high accuracy in clean test cases.
更多
查看译文
关键词
harmonic structure,data augmentation,feature fusion,acoustic model,neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要