GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
arxiv(2024)
Abstract
The rapid advancement of large language models (LLMs) has catalyzed the
deployment of LLM-powered agents across numerous applications, raising new
concerns regarding their safety and trustworthiness. Existing methods for
enhancing the safety of LLMs are not directly transferable to LLM-powered
agents due to their diverse objectives and output modalities. In this paper, we
propose GuardAgent, the first LLM agent as a guardrail to other LLM agents.
Specifically, GuardAgent oversees a target LLM agent by checking whether its
inputs/outputs satisfy a set of given guard requests defined by the users.
GuardAgent comprises two steps: 1) creating a task plan by analyzing the
provided guard requests, and 2) generating guardrail code based on the task
plan and executing the code by calling APIs or using external engines. In both
steps, an LLM is utilized as the core reasoning component, supplemented by
in-context demonstrations retrieved from a memory module. Such
knowledge-enabled reasoning allows GuardAgent to understand various textual
guard requests and accurately "translate" them into executable code that
provides reliable guardrails. Furthermore, GuardAgent is equipped with an
extendable toolbox containing functions and APIs and requires no additional LLM
training, which underscores its generalization capabilities and low operational
overhead. Additionally, we propose two novel benchmarks: an EICU-AC benchmark
for assessing privacy-related access control for healthcare agents and a
Mind2Web-SC benchmark for safety evaluation for web agents. We show the
effectiveness of GuardAgent on these two benchmarks with 98.7
accuracy in moderating invalid inputs and outputs for the two types of agents,
respectively. We also show that GuardAgent is able to define novel functions in
adaption to emergent LLM agents and guard requests, which underscores its
strong generalization capabilities.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined