Latent Skill Discovery for Chain-of-Thought Reasoning
CoRR(2023)
摘要
Recent advances in Large Language Models (LLMs) have led to an emergent
ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that
adds intermediate rationale steps between questions and answers to construct
prompts. Conditioned on these prompts, LLMs can effectively learn in context to
generate rationales that lead to more accurate answers than when answering the
same question directly. To design LLM prompts, one important setting, called
demonstration selection, considers selecting demonstrations from an example
bank. Existing methods use various heuristics for this selection, but for CoT
prompting, which involves unique rationales, it is essential to base the
selection upon the intrinsic skills that CoT rationales need, for instance, the
skills of addition or subtraction for math word problems.
To address this requirement, we introduce a novel approach named Reasoning
Skill Discovery (RSD) that use unsupervised learning to create a latent space
representation of rationales, called a reasoning skill. Simultaneously, RSD
learns a reasoning policy to determine the required reasoning skill for a given
question. This can then guide the selection of examples that demonstrate the
required reasoning skills. Our approach offers several desirable properties: it
is (1) theoretically grounded, (2) sample-efficient, requiring no LLM inference
or manual prompt design, and (3) LLM-agnostic. Empirically, RSD outperforms
existing methods by up to 6% in terms of the answer accuracy across multiple
reasoning tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要