ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
CoRR(2024)
摘要
This paper introduces ESPnet-SPK, a toolkit designed with several objectives
for training speaker embedding extractors. First, we provide an open-source
platform for researchers in the speaker recognition community to effortlessly
build models. We provide several models, ranging from x-vector to recent
SKA-TDNN. Through the modularized architecture design, variants can be
developed easily. We also aspire to bridge developed models with other domains,
facilitating the broad research community to effortlessly incorporate
state-of-the-art embedding extractors. Pre-trained embedding extractors can be
accessed in an off-the-shelf manner and we demonstrate the toolkit's
versatility by showcasing its integration with two tasks. Another goal is to
integrate with diverse self-supervised learning features. We release a
reproducible recipe that achieves an equal error rate of 0.39
evaluation protocol using WavLM-Large with ECAPA-TDNN.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要