Sufficiency Quantification For Seamless Text-Independent Speaker Enrollment

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2018)

Cited 23|Views30
No score
Abstract
Text-independent speaker recognition (TI-SR) requires a lengthy enrollment process that involves asking dedicated time from the user to create a reliable model of their voice. Seamless enrollment is a highly attractive feature which refers to the enrollment process that happens in the background and asks for no dedicated time from the user. One of the key problems in a fully automated seamless enrollment process is to determine the sufficiency of a given utterance collection for the purpose of TI-SR. No known metric exists in the literature to quantify sufficiency. This paper introduces a novel metric called phoneme-richness score. Quality of a sufficiency metric can be assessed via its correlation with the TI-SR performance. Our assessment shows that phoneme-richness score achieves -0.96 correlation with TI-SR performance (measured in equal error rate), which is highly significant, whereas a naive sufficiency metric like speech duration achieves only -0.68 correlation.
More
Translated text
Key words
Speaker recognition, text-independent, seamless enrollment, sufficiency metric, phoneme-richness metric
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined