Enhancing Spatial Audio Generation with Source Separation and Channel Panning Loss

Wootaek Lim,Juhan Nam

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
Spatial audio is essential for many immersive content services; however, it is challenging to obtain or create it. Recently, multimodal-based ambisonic audio generation has emerged as a promising approach for addressing the limitation. It combines multiple modalities, such as audio and video, and provides more intuitive control of ambisonic audio generation. Moreover, it leverages the advantages of machine-learning methods to automatically learn the correlation between different features and generate high-quality ambisonic sounds. Herein, we propose a separation- and localization-based spatial audio generation model. First, the network extracts visual features and separates audio into sound sources. Then, it conducts localization by mapping the separated sound sources to the visual features. To overcome the performance limitation of the previous self-supervised source separation approach, we employ a pretrained source separator with superior performance. To improve the localization performance further, we propose a channel panning loss function between each channel of the ambisonic signal. We use three different types of datasets to train the model experimentally and evaluate the proposed method with four metrics. The results show that the proposed model achieves better spatialization performance than the baseline models.
更多
查看译文
关键词
Spatial audio generation,Source separation,Channel panning loss,Ambisonics,Deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要