Framewise Multiple Sound Source Localization and Counting Using Binaural Spatial Audio Signals

Lei Wang, Zhibin Jiao,Qiyong Zhao,Jie Zhu, Yang Fu

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览0
暂无评分
摘要
Sound source localization is the problem of estimating the positions of one or several sound sources. In terms of binaural audio, localization is a paramount perceptual characteristic which can be assessed subjectively or objectively. For objective evaluation of binaural sound localization, typical methods exploit binaural or monaural cues to predict directions of sound sources. Since multiple sound sources are often perceived simultaneously in daily sound scenes, an objective sound localization model which can detect temporally overlapping sources is required. In this paper, we propose a binaural multiple sound source localization network (BMSSLnet) model, which can predict framewise azimuths without prior knowledge of sound source number in a binaural audio signal. We implement multiple azimuth prediction as a multi-label classification task, and propose to use separated multi-label cross-entropy and mean square error as the loss function. Experimental results show that the proposed model obtains the average precision of 0.9 and 0.75 for spatial prediction on the anechoic dataset and reverberant dataset with up to three temporally overlapping sources, respectively. Framewise temporal prediction with average accuracy of 38.3 ms is achieved.
更多
查看译文
关键词
Binaural localization,multiple source localization,deep learning,multi-label classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要