TinyLLM: Learning a Small Student from Multiple Large Language Models
CoRR(2024)
摘要
Transferring the reasoning capability from stronger large language models
(LLMs) to smaller ones has been quite appealing, as smaller LLMs are more
flexible to deploy with less expense. Among the existing solutions, knowledge
distillation stands out due to its outstanding efficiency and generalization.
However, existing methods suffer from several drawbacks, including limited
knowledge diversity and the lack of rich contextual information. To solve the
problems and facilitate the learning of compact language models, we propose
TinyLLM, a novel knowledge distillation paradigm to learn a small student LLM
from multiple large teacher LLMs. In particular, we encourage the student LLM
to not only generate the correct answers but also understand the rationales
behind these answers. Given that different LLMs possess diverse reasoning
skills, we guide the student model to assimilate knowledge from various teacher
LLMs. We further introduce an in-context example generator and a
teacher-forcing Chain-of-Thought strategy to ensure that the rationales are
accurate and grounded in contextually appropriate scenarios. Extensive
experiments on six datasets across two reasoning tasks demonstrate the
superiority of our method. Results show that TinyLLM can outperform large
teacher LLMs significantly, despite having a considerably smaller model size.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要