Effect of utterance duration and phonetic content on speaker identification using second-order statistical methods
EUROSPEECH(2024)
摘要
Second-order statistical methods show very good results for automatic speaker
identification in controlled recording conditions. These approaches are
generally used on the entire speech material available. In this paper, we study
the influence of the content of the test speech material on the performances of
such methods, i.e. under a more analytical approach. The goal is to investigate
on the kind of information which is used by these methods, and where it is
located in the speech signal. Liquids and glides together, vowels, and more
particularly nasal vowels and nasal consonants, are found to be particularly
speaker specific: test utterances of 1 second, composed in majority of acoustic
material from one of these classes provide better speaker identification
results than phonetically balanced test utterances, even though the training is
done, in both cases, with 15 seconds of phonetically balanced speech.
Nevertheless, results with other phoneme classes are never dramatically poor.
These results tend to show that the speaker-dependent information captured by
long-term second-order statistics is consistently common to all phonetic
classes, and that the homogeneity of the test material may improve the quality
of the estimates.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要