Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
CoRR(2023)
摘要
Large language models (LLMs) have shown impressive capabilities in various
tasks, yet they still struggle with math reasoning. Despite efforts to optimize
Chain-of-Thoughts (CoT) prompts and fine-tune LLMs, the potential of few-shot
learning remains unexplored. In this work, we propose CoT-Max, a novel approach
pushing the boundaries of few-shot CoT learning to improve LLM math reasoning
capabilities. CoT-Max addresses the challenges of the selection of useful
examples and limited number of examples due to restricted context window
length. Inspired by our observation that natural language inputs contain many
redundancy, we propose a coarse-to-fine pruner as a plug-and-play module for
LLMs, which first identifies crucial CoT examples from a large batch and then
further prunes unimportant tokens. To train the pruner, we collect a math
reasoning dataset with diverse difficulty and steps, introduce a reward to
measure both the input's effectiveness for math reasoning and token length
constraints, and propose a novel training approach with reinforcement learning.
As a result, CoT-Max significantly outperforms CoT and few-shot prompting
baselines across various LLMs (LLaMA2-7B, 13B, 70B) and 5 mathematical
datasets, achieving up to 4.55% absolute improvements. Remarkably, without any
fine-tuning, LLaMA2-70B with CoT-Max surpasses GPT-3.5 and a wide range of
larger LLMs (PaLM, Minerva, etc.) on the GSM8K.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要