QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization

IEEE/ACM Transactions on Audio, Speech, and Language Processing(2023)

引用 0|浏览18
暂无评分
摘要
We improve iterative separation-based speaker diarization (ISSD) with quality-aware dynamic masking (QDM). We call the proposed framework QDM-SSD. Compared with ISSD, QDM-SSD enhances the simulated data used for model adaptation through QDM to alleviate the influence of errors in speaker priors. In addition to data quality purification, QDM-SSD also makes the adaptation data sparse by automatically adjusting speaker overlap ratios according to data quality. Furthermore, using a sliding window over the adaptation data, clean regions in speech segments can be better localized. Experiments on the two-speaker conversational telephone speech (CTS) corpus show that the proposed QDM-SSD framework can reduce the diarization error rate (DER) by 18.56% relatively compared with ISSD. Moreover, QDM-SSD is shown to generalize to other two-speaker non-conversation telephone speech data sets where ISSD fails to work. Finally, we demonstrate that QDM-SSD can serve as a front-end to improve the performances of back-end automatic speech recognition.
更多
查看译文
关键词
Data quality control,dynamic mask,speaker diarization,speech separation,voice activity detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要