Robust statistical processing of TDOA estimates for distant speaker diarization.

European Signal Processing Conference(2017)

引用 3|浏览6
暂无评分
摘要
Speaker diarization systems aim to segment an audio signal into homogeneous sections with only one active speaker and answer the question "who spoke when?" We present a novel approach to speaker diarization exploiting spatial information through robust statistical modeling of Time Difference of Arrival (TDOA) estimates obtained using pairs of microphones. The TDOAs are modeled with Gaussian Mixture Models (GMM) trained in a robust manner with the expectation-conditional maximization algorithm and minorization- maximization approach. In situations of multiple microphone deployment, our method allows for the selection of the best microphone pair as part of the modeling and supports ad-hoc microphone placement. Such information can be useful for subsequent speech processing algorithms. We show that our method, which uses only spatial information, achieves up to 36.1% relative reduction in speaker error time compared to an open source toolkit using TDOA features and tested on the NIST RT05 multiparty meeting database.
更多
查看译文
关键词
robust statistical processing,TDOA estimates,distant speaker diarization,speaker diarization systems,audio signal,active speaker,spatial information,robust statistical modeling,expectation-conditional maximization algorithm,minorization-maximization approach,multiple microphone deployment,microphone pair,ad-hoc microphone placement,speaker error time,TDOA features,time difference of arrival estimates,speech processing algorithms,Gaussian mixture models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要