Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
ICLR 2024(2023)
摘要
Recent works have showcased the ability of large-scale language models (LLMs)
to embody diverse personas in their responses, exemplified by prompts like 'You
are Yoda. Explain the Theory of Relativity.' While this ability allows
personalization of LLMs and enables human behavior simulation, its effect on
LLMs' capabilities remain unclear. To fill this gap, we present the first
extensive study of the unintended side-effects of persona assignment on the
ability of LLMs, specifically ChatGPT, to perform basic reasoning tasks. Our
study covers 24 reasoning datasets and 16 diverse personas spanning 5
socio-demographic groups: race, gender, religion, disability, and political
affiliation. Our experiments unveil that ChatGPT carries deep rooted bias
against various socio-demographics underneath a veneer of fairness. While it
overtly rejects stereotypes when explicitly asked ('Are Black people less
skilled at mathematics?'), it manifests stereotypical and often erroneous
presumptions when prompted to answer questions while taking on a persona. These
can be observed as abstentions in the model responses, e.g., 'As a Black
person, I am unable to answer this question as it requires math knowledge', and
generally result in a substantial drop in performance on reasoning tasks. We
find that this inherent deep bias is ubiquitous - 80% of our personas
demonstrated bias; it is significant - certain datasets had relative drops in
performance of 70%+; and can be especially harmful for certain groups - certain
personas had stat. sign. drops on more than 80% of the datasets. Further
analysis shows that these persona-induced errors can be hard-to-discern and
hard-to-avoid. Our findings serve as a cautionary tale that the practice of
assigning personas to LLMs - a trend on the rise - can surface their
deep-rooted biases and have unforeseeable and detrimental side-effects.
更多查看译文
关键词
Bias,Fairness,LLM,Reasoning,Persona
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要