Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
CoRR(2024)
摘要
Large Language Models exhibit robust problem-solving capabilities for diverse
tasks. However, most LLM-based agents are designed as specific task solvers
with sophisticated prompt engineering, rather than agents capable of learning
and evolving through interactions. These task solvers necessitate manually
crafted prompts to inform task rules and regulate LLM behaviors, inherently
incapacitating to address complex dynamic scenarios e.g., large interactive
games. In light of this, we propose Agent-Pro: an LLM-based Agent with
Policy-level Reflection and Optimization that can learn a wealth of expertise
from interactive experiences and progressively elevate its behavioral policy.
Specifically, it involves a dynamic belief generation and reflection process
for policy evolution. Rather than action-level reflection, Agent-Pro
iteratively reflects on past trajectories and beliefs, fine-tuning its
irrational beliefs for a better policy. Moreover, a depth-first search is
employed for policy optimization, ensuring continual enhancement in policy
payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em,
outperforming vanilla LLM and specialized models. Our results show Agent-Pro
can learn and evolve in complex and dynamic scenes, which also benefits
numerous LLM-based applications.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要