From Vulnerabilities to Improvements- A Deep Dive into Adversarial Testing of AI Models

Brendan Hannon,Yulia Kumar, Peter Sorial,J. Jenny Li,Patricia Morreale

2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)(2023)

引用 0|浏览0
暂无评分
摘要
The security vulnerabilities inherent in large language models (LLMs), such as OpenAI's ChatGPT-3.5 and ChatGPT -4, Bing Bot, and Google's Bard, are explored in this paper. The focus is on the susceptibility of these models to malicious prompting and the potential for generating unethical content. An investigation is conducted into the responses these models provide when tasked with completing a movie script involving a character disseminating information about murder, weapons, and drugs. The analysis reveals that, despite the presence of filters designed to prevent the generation of unethical or harmful content, these models can be manipulated through malicious prompts to produce inappropriate and even illegal responses. This discovery underscores the urgent need for a comprehensive understanding of these vulnerabilities, as well as the development of effective measures to enhance the security and reliability of LLMs. This paper offers valuable insights into the security vulnerabilities that arise when these models are prompted to generate malicious content.
更多
查看译文
关键词
prompt engineering,chatbots,ChatGPT,Bard,security vulnerabilities in chatbots,risk mitigation,adversarial attacks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要