Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis
CoRR(2024)
摘要
The humanlike responses of large language models (LLMs) have prompted social
scientists to investigate whether LLMs can be used to simulate human
participants in experiments, opinion polls and surveys. Of central interest in
this line of research has been mapping out the psychological profiles of LLMs
by prompting them to respond to standardized questionnaires. The conflicting
findings of this research are unsurprising given that mapping out underlying,
or latent, traits from LLMs' text responses to questionnaires is no easy task.
To address this, we use psychometrics, the science of psychological
measurement. In this study, we prompt OpenAI's flagship models, GPT-3.5 and
GPT-4, to assume different personas and respond to a range of standardized
measures of personality constructs. We used two kinds of persona descriptions:
either generic (four or five random person descriptions) or specific (mostly
demographics of actual humans from a large-scale human dataset). We found that
the responses from GPT-4, but not GPT-3.5, using generic persona descriptions
show promising, albeit not perfect, psychometric properties, similar to human
norms, but the data from both LLMs when using specific demographic profiles,
show poor psychometrics properties. We conclude that, currently, when LLMs are
asked to simulate silicon personas, their responses are poor signals of
potentially underlying latent traits. Thus, our work casts doubt on LLMs'
ability to simulate individual-level human behaviour across multiple-choice
question answering tasks.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要