On Large Language Models' Hallucination with Regard to Known Facts
CoRR(2024)
摘要
Large language models are successful in answering factoid questions but are
also prone to hallucination.We investigate the phenomenon of LLMs possessing
correct answer knowledge yet still hallucinating from the perspective of
inference dynamics, an area not previously covered in studies on
hallucinations.We are able to conduct this analysis via two key ideas.First, we
identify the factual questions that query the same triplet knowledge but result
in different answers. The difference between the model behaviors on the correct
and incorrect outputs hence suggests the patterns when hallucinations happen.
Second, to measure the pattern, we utilize mappings from the residual streams
to vocabulary space. We reveal the different dynamics of the output token
probabilities along the depths of layers between the correct and hallucinated
cases. In hallucinated cases, the output token's information rarely
demonstrates abrupt increases and consistent superiority in the later stages of
the model. Leveraging the dynamic curve as a feature, we build a classifier
capable of accurately detecting hallucinatory predictions with an 88% success
rate. Our study shed light on understanding the reasons for LLMs'
hallucinations on their known facts, and more importantly, on accurately
predicting when they are hallucinating.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要