Dual Parameter-Efficient Fine-Tuning for Speaker Representation Via Speaker Prompt Tuning and Adapters

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览2
暂无评分
摘要
Fine-tuning a pre-trained Transformer model (PTM) for speech applications in a parameter-efficient manner offers the dual benefits of reducing memory and leveraging the rich feature representations in massive unlabeled datasets. However, existing parameter-efficient fine-tuning approaches either adapt the classification head or the whole PTM. The former is unsuitable when the PTM is used as a feature extractor, and the latter does not leverage the different degrees of feature abstraction at different Transformer layers. We propose two solutions to address these limitations. First, we apply speaker prompt tuning to update the task-specific embeddings of a PTM. The tuning enhances speaker feature relevance in the speaker embeddings through the cross-attention between prompt and speaker features. Second, we insert adapter blocks into the Transformer encoders and their outputs. This novel arrangement enables the fine-tuned PTM to determine the most suitable layers to extract relevant information for the downstream task. Extensive speaker verification experiments on Voxceleb and CU-MARVEL demonstrate higher parameter efficiency and better model adaptability of the proposed methods than the existing ones.
更多
查看译文
关键词
Speaker verification,parameter-efficient tuning,prompt tuning,Transformer adapter,pre-trained Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要