Unsupervised Training of Siamese Networks for Speaker Verification.

INTERSPEECH(2020)

引用 14|浏览7
暂无评分
摘要
Speaker labeled background data is an essential requirement for most state-of-the-art approaches in speaker recognition, e.g., x-vectors and i-vector/PLDA. However, in reality it is difficult to access large amount of labeled data. In this work, we propose siamese networks for speaker verification without using speaker labels. We propose two different siamese networks having two and three branches, respectively, where each branch is a CNN encoder. Since the goal is to avoid speaker labels, we propose to generate the training pairs in an unsupervised manner. The client samples are selected within one database according to highest cosine scores with the anchor in i-vector space. The impostor samples are selected in the same way but from another database. Our double-branch siamese performs binary classification using cross entropy loss during training. In testing phase, we obtain speaker verification scores directly from its output layer. Whereas, our triple-branch siamese is trained to learn speaker embeddings using triplet loss. During testing, we extract speaker embeddings from its output layer, which are scored in the experiments using cosine scoring. The evaluation is performed on VoxCeleb-1 database, which show that using the proposed unsupervised systems, solely or in fusion, the results get closer to supervised baseline.
更多
查看译文
关键词
i-vector, impostor selection, CNN, triplet loss
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要