PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning
CoRR(2024)
摘要
Recent advancements in large language models (LLMs) have raised concerns
about inference costs, increasing the need for research into model compression.
While knowledge distillation (KD) is a prominent method for this, research on
KD for generative language models like LLMs is relatively sparse, and the
approach of distilling student-friendly knowledge, which has shown promising
performance in KD for classification models, remains unexplored in generative
language models. To explore this approach, we propose PromptKD, a simple yet
effective method that utilizes prompt tuning - for the first time in KD - to
enable generative language models to transfer student-friendly knowledge.
Unlike previous works in classification that require fine-tuning the entire
teacher model for extracting student-friendly knowledge, PromptKD achieves
similar effects by adding a small number of prompt tokens and tuning only the
prompt with student guidance. Extensive experiments on instruction-following
datasets using the GPT-2 model family show that PromptKD achieves
state-of-the-art performance while adding only 0.0007
parameters as prompts. Further analysis suggests that distilling
student-friendly knowledge alleviates exposure bias effectively throughout the
entire training process, leading to performance enhancements.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要