Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
CoRR(2024)
Abstract
Generative artificial intelligence (AI) is interacting with people at an
unprecedented scale, offering new avenues for immense positive impact, but also
raising widespread concerns around the potential for individual and societal
harm. Today, the predominant paradigm for human-AI safety focuses on
fine-tuning the generative model's outputs to better agree with human-provided
examples or feedback. In reality, however, the consequences of an AI model's
outputs cannot be determined in an isolated context: they are tightly entangled
with the responses and behavior of human users over time. In this position
paper, we argue that meaningful safety assurances for these AI technologies can
only be achieved by reasoning about how the feedback loop formed by the AI's
outputs and human behavior may drive the interaction towards different
outcomes. To this end, we envision a high-value window of opportunity to bridge
the rapidly growing capabilities of generative AI and the dynamical safety
frameworks from control theory, laying a new foundation for human-centered AI
safety in the coming decades.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined