Tree Search for Language Model Agents
arxiv(2024)
Abstract
Autonomous agents powered by language models (LMs) have demonstrated promise
in their ability to perform decision-making tasks such as web automation.
However, a key limitation remains: LMs, primarily optimized for natural
language understanding and generation, struggle with multi-step reasoning,
planning, and using environmental feedback when attempting to solve realistic
computer tasks. Towards addressing this, we propose an inference-time search
algorithm for LM agents to explicitly perform exploration and multi-step
planning in interactive web environments. Our approach is a form of best-first
tree search that operates within the actual environment space, and is
complementary with most existing state-of-the-art agents. It is the first tree
search algorithm for LM agents that shows effectiveness on realistic web tasks.
On the challenging VisualWebArena benchmark, applying our search algorithm on
top of a GPT-4o agent yields a 39.7
to the same baseline without search, setting a state-of-the-art success rate of
26.4
baseline agent, setting a competitive success rate of 19.2
highlight the effectiveness of search for web agents, and we demonstrate that
performance scales with increased test-time compute. We conduct a thorough
analysis of our results to highlight improvements from search, limitations, and
promising directions for future work. Our code and models are publicly released
at https://jykoh.com/search-agents.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined