Hybrid Attention Time-Frequency Analysis Network for Single-Channel Speech Enhancement

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览4
暂无评分
摘要
The time-frequency domain remains central to the speech signal analysis. Enhancing the efficacy of neural network-based speech models demands a detailed multi-scale analysis of time-frequency features. This study presents the Hybrid Attention Time-Frequency Analysis Network (HATFANet), an innovative model that uses a dual-branch structure to concurrently estimate the ideal ratio mask and the enhanced complex spectrum. Each branch incorporates Hybrid Attention Blocks (HABs) to capture local, global, and inter-window attention for more effective deep feature extraction by employing reshaping techniques and gated multi-layer perceptrons to focus on different attention scales. The addition of residual channel attention and window multi-head self-attention mechanism accentuate channel attention features and intra-window attention. Our experiments verify the pivotal role of these HABs across varied attentional scales. HATFANet achieves state-of-the-art results on the Voice Bank + DEMAND dataset, recording 3.37 PESQ, 95.8% STOI, and 10.15 SSNR.
更多
查看译文
关键词
Speech enhancement,hybrid attention,gated Multi-layer perceptron,time-frequency analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要