Detecting and counting overlapping speakers in distant speech scenarios

INTERSPEECH(2020)

引用 22|浏览42
暂无评分
摘要
We consider the problem of detecting the activity and counting overlapping speakers in distant-microphone recordings. We treat supervised Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), joint VAD+OSD, and speaker counting as instances of a general Overlapped Speech Detection and Counting (OSDC) task, and we design a Temporal Convolu-tional Network (TCN) based method to address it. We show that TCNs significantly outperform state-of-the-art methods on two real-world distant speech datasets. In particular our best architecture obtains, for OSD, 29.1% and 25.5% absolute improvement in Average Precision over previous techniques on, respectively, the AMI and CHiME-6 datasets. Furthermore, we find that generalization for joint VAD+OSD improves by using a speaker counting objective rather than a VAD+OSD objective. We also study the effectiveness of forced alignment based labeling and data augmentation, and show that both can improve OSD performance.
更多
查看译文
关键词
distant speech scenarios,speakers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要