TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution
CoRR(2024)
Abstract
The emergence of LLM-based agents has garnered considerable attention, yet
their trustworthiness remains an under-explored area. As agents can directly
interact with the physical environment, their reliability and safety is
critical. This paper presents an Agent-Constitution-based agent framework,
TrustAgent, an initial investigation into improving the safety dimension of
trustworthiness in LLM-based agents. This framework consists of threefold
strategies: pre-planning strategy which injects safety knowledge to the model
prior to plan generation, in-planning strategy which bolsters safety during
plan generation, and post-planning strategy which ensures safety by
post-planning inspection. Through experimental analysis, we demonstrate how
these approaches can effectively elevate an LLM agent's safety by identifying
and preventing potential dangers. Furthermore, we explore the intricate
relationships between safety and helpfulness, and between the model's reasoning
ability and its efficacy as a safe agent. This paper underscores the imperative
of integrating safety awareness and trustworthiness into the design and
deployment of LLM-based agents, not only to enhance their performance but also
to ensure their responsible integration into human-centric environments. Data
and code are available at https://github.com/agiresearch/TrustAgent.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined