A Score-aware Generative Approach for Music Signals Inpainting

2023 4TH INTERNATIONAL SYMPOSIUM ON THE INTERNET OF SOUNDS(2023)

引用 0|浏览0
暂无评分
摘要
Several issues can seriously degrade the quality of digital audio, such as packet loss on IP-based networks or damaged storage media, impacting intelligibility and user experience. This paper presents a generative approach, aiming to repair lost fragments in audio streams. Inspired by the well-established image-to-image translation ability of generative adversarial networks (GANs) and based on the bin2bin framework, previously introduced for speech inpainting, we propose an enhanced framework which performs the translation task from CQT magnitude spectrograms of music signal frames with lost regions, to reliable spectrograms. The goal is to effectively reconstruct missing audio segments, enabling a seamless listening experience for the audience. The proposed pipeline combines the traditional GAN discriminative loss function with two additional objectives: a loss function related to perceptual audio quality, and a second one based on the L2 norm between the true and predicted piano-roll, estimated by the CQT reconstruction. Through comprehensive evaluations on gaps of 375 ms and 750 ms, which are considered in the literature to be respectively of "small" and "medium" duration, we demonstrate the robustness and effectiveness of our framework in producing coherent reconstructions with reduced artifacts. The proposed approach outperforms a baseline cGAN-based method, GACELA. In terms of ODG score, a metric inspired by a human-based scoring system, we achieve a gain in performance up to 13.3%, while the improvement in Structural Similarity (SSIM) between the clean and restored spectrograms reaches 13.6%.
更多
查看译文
关键词
Music Inpainting,Spectrogram Inpainting,Conditional Generative Adversarial Networks,bin2bin
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要