Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection
arxiv(2024)
摘要
Self-supervised learning (SSL) speech representation models, trained on large
speech corpora, have demonstrated effectiveness in extracting hierarchical
speech embeddings through multiple transformer layers. However, the behavior of
these embeddings in specific tasks remains uncertain. This paper investigates
the multi-layer behavior of the WavLM model in anti-spoofing and proposes an
attentive merging method to leverage the hierarchical hidden embeddings.
Results demonstrate the feasibility of fine-tuning WavLM to achieve the best
equal error rate (EER) of 0.65
2021LA, and 2021DF evaluation sets, respectively. Notably, We find that the
early hidden transformer layers of the WavLM large model contribute
significantly to anti-spoofing task, enabling computational efficiency by
utilizing a partial pre-trained model.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要