Improving Design of Input Condition Invariant Speech Enhancement
CoRR(2024)
摘要
Building a single universal speech enhancement (SE) system that can handle
arbitrary input is a demanded but underexplored research topic. Towards this
ultimate goal, one direction is to build a single model that handles diverse
audio duration, sampling frequencies, and microphone variations in noisy and
reverberant scenarios, which we define here as "input condition invariant SE".
Such a model was recently proposed showing promising performance; however, its
multi-channel performance degraded severely in real conditions. In this paper
we propose novel architectures to improve the input condition invariant SE
model so that performance in simulated conditions remains competitive while
real condition degradation is much mitigated. For this purpose, we redesign the
key components that comprise such a system. First, we identify that the
channel-modeling module's generalization to unseen scenarios can be sub-optimal
and redesign this module. We further introduce a two-stage training strategy to
enhance training efficiency. Second, we propose two novel dual-path
time-frequency blocks, demonstrating superior performance with fewer parameters
and computational costs compared to the existing method. All proposals
combined, experiments on various public datasets validate the efficacy of the
proposed model, with significantly improved performance on real conditions.
Recipe with full model details is released at https://github.com/espnet/espnet.
更多查看译文
关键词
Universal speech enhancement,sampling-frequency-independent,microphone-number-invariant
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要