Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis

Bradley Menz, Nicole Kuderer,Stephen Bacchi,Natansh Modi,Benjamin Chin-Yee, Tiancheng Hu, Ceara Rickard, Mark Haseloff,Agnes Vitry,Ross Mckinnon,Ganessan Kichenadasse,Andrew Rowland,Michael Sorich,Ashley Hopkins

BMJ-BRITISH MEDICAL JOURNAL（2024）

引用 0|浏览14

暂无评分

摘要

OBJECTIVES To evaluate the effectiveness of safeguards to prevent large language models (LLMs) from being misused to generate health disinformation, and to evaluate the transparency of artificial intelligence (AI) developers regarding their risk mitigation processes against observed vulnerabilities. DESIGN Repeated cross sectional analysis. SETTING Publicly accessible LLMs. METHODS In a repeated cross sectional analysis, four LLMs (via chatbots/assistant interfaces) were evaluated: OpenAI's GPT-4 (via ChatGPT and Microsoft's Copilot), Google's PaLM 2 and newly released Gemini Pro (via Bard), Anthropic's Claude 2 (via Poe), and Meta's Llama 2 (via HuggingChat). In September 2023, these LLMs were prompted to generate health disinformation on two topics: sunscreen as a cause of skin cancer and the alkaline diet as a cancer cure. Jailbreaking techniques (ie, attempts to bypass safeguards) were evaluated if required. For LLMs with observed safeguarding vulnerabilities, the processes for reporting outputs of concern were audited. 12 weeks after initial investigations, the disinformation generation capabilities of the LLMs were re-evaluated to assess any subsequent improvements in safeguards.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要