Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization
arxiv(2024)
Abstract
End-to-end neural diarization (EEND) models offer significant improvements
over traditional embedding-based Speaker Diarization (SD) approaches but falls
short on generalizing to long-form audio with large number of speakers.
EEND-vector-clustering method mitigates this by combining local EEND with
global clustering of speaker embeddings from local windows, but this requires
an additional speaker embedding framework alongside the EEND module. In this
paper, we propose a novel framework applying EEND both locally and globally for
long-form audio without separate speaker embeddings. This approach achieves
significant relative DER reduction of 13
EEND on Callhome American English and RT03-CTS datasets respectively and
marginal improvements over EEND-vector-clustering without the need for
additional speaker embeddings. Furthermore, we discuss the computational
complexity of our proposed framework and explore strategies for reducing
processing times.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined