Investigating the Design Space of Diffusion Models for Speech Enhancement
CoRR(2023)
摘要
Diffusion models are a new class of generative models that have shown
outstanding performance in image generation literature. As a consequence,
studies have attempted to apply diffusion models to other tasks, such as speech
enhancement. A popular approach in adapting diffusion models to speech
enhancement consists in modelling a progressive transformation between the
clean and noisy speech signals. However, one popular diffusion model framework
previously laid in image generation literature did not account for such a
transformation towards the system input, which prevents from relating the
existing diffusion-based speech enhancement systems with the aforementioned
diffusion model framework. To address this, we extend this framework to account
for the progressive transformation between the clean and noisy speech signals.
This allows us to apply recent developments from image generation literature,
and to systematically investigate design aspects of diffusion models that
remain largely unexplored for speech enhancement, such as the neural network
preconditioning, the training loss weighting, the stochastic differential
equation (SDE), or the amount of stochasticity injected in the reverse process.
We show that the performance of previous diffusion-based speech enhancement
systems cannot be attributed to the progressive transformation between the
clean and noisy speech signals. Moreover, we show that a proper choice of
preconditioning, training loss weighting, SDE and sampler allows to outperform
a popular diffusion-based speech enhancement system in terms of perceptual
metrics while using fewer sampling steps, thus reducing the computational cost
by a factor of four.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要