TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance
CoRR(2024)
摘要
Large Language Models (LLMs) have recently showcased remarkable reasoning
abilities. However, larger models often surpass their smaller counterparts in
reasoning tasks, posing the challenge of effectively transferring these
capabilities from larger models. Existing approaches heavily rely on extensive
fine-tuning data or continuous interactions with a superior teacher LLM during
inference. We introduce a principle-based teacher-student framework called
“Teaching via Principle Discovery” (TPD) to address these limitations.
Inspired by human learning mechanisms, TPD mimics the interaction between a
teacher and a student using a principle-based approach. The teacher LLM
generates problem-solving instructions and corrective principles based on the
student LLM's errors. These principles guide the refinement of instructions and
the selection of instructive examples from a validation set. This enables the
student model to learn from both the teacher's guidance and its own mistakes.
Once the student model begins making inferences, TPD requires no further
intervention from the teacher LLM or humans. Through extensive experiments
across eight reasoning tasks, we demonstrate the effectiveness of TPD. Compared
to standard chain-of-thought prompting, TPD significantly improves the student
model's performance, achieving 6.2% improvement on average.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要