Detecting Overlapping Speech With Long Short-Term Memory Recurrent Neural Networks

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5(2013)

引用 52|浏览58
暂无评分
摘要
Detecting segments of overlapping speech (when two or more speakers are active at the same time) is a challenging problem. Previously, mostly HMM-based systems have been used for overlap detection, employing various different audio features. In this work, we propose a novel overlap detection system using Long Short-Term Memory (LSTM) recurrent neural networks. LSTMs are used to generate framewise overlap predictions which are applied for overlap detection. Furthermore, a tandem HMM-LSTM system is obtained by adding LSTM predictions to the HMM feature set. Experiments with the AMI corpus show that overlap detection performance of LSTMs is comparable to HMMs. The combination of HMMs and LSTMs improves overlap detection by achieving higher recall.
更多
查看译文
关键词
Speech Overlap Detection,Speaker Diarization,Neural Networks,Long Short-Term Memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要