Chrome Extension
WeChat Mini Program
Use on ChatGLM

ECAPA-TDNN Embeddings for Speaker Diarization

Nauman Dawalatabad, Mirco Ravanelli, Francois Grondin, Jenthe Thienpondt, Brecht Desplanques, Hwidong Na

Interspeech(2021)

Cited 33|Views58
No score
Abstract
Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental component of modern diarization systems. Recently, some improvements over the standard TDNN architecture used for x-vectors have been proposed. The ECAPA-TDNN model, for instance, has shown impressive performance in the speaker verification domain, thanks to a carefully designed neural model. In this work, we extend, for the first time, the use of the ECAPA-TDNN model to speaker diarization. Moreover, we improved its robustness with a powerful augmentation scheme that concatenates several contaminated versions of the same signal within the same training batch. The ECAPA-TDNN model turned out to provide robust speaker embeddings under both close-talking and distant-talking conditions. Our results on the popular AMI meeting corpus show that our system significantly outperforms recently proposed approaches.
More
Translated text
Key words
speaker diarization,speaker embedding,data augmentation,spectral clustering
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined