Restoration of Bone-Conducted Speech With U-Net-Like Model and Energy Distance Loss

Changtao Li,Feiran Yang,Jun Yang

IEEE SIGNAL PROCESSING LETTERS(2024)

引用 0|浏览2
暂无评分
摘要
Bone-Conducted speech is less susceptible to ambient noise interference, but it suffers from poor speech quality due to the limited bandwidth. In this letter, we propose a U-Net-like network for the restoration of bone-conducted speech in the time domain. The proposed network consists of residual-connected one-dimensional convolutions and shifted window-based attention modules, which can model long-term dependencies crucial in speech processing. We find that the prevalent time-domain l1 loss may be insufficient for the generation of high-frequency information absent in bone-conducted speech. To address this issue, we propose to utilize the generalized energy distance loss based on multi-scale Mel spectrograms as the objective function. Experimental results on the ESMB dataset validate the efficacy of our proposed method in restoration of bone-conducted speech. The proposed approach significantly outperforms two recent time-domain benchmarks, DPT-EGNet and EBEN, in terms of PESQ and STOI metrics.
更多
查看译文
关键词
Bone-conducted speech,speech enhancement,speech synthesis,attention,spectral energy distance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要