Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
CoRR(2024)
摘要
Large Language Models (LLMs) have shown impressive capabilities but still
suffer from the issue of hallucinations. A significant type of this issue is
the false premise hallucination, which we define as the phenomenon when LLMs
generate hallucinated text when confronted with false premise questions. In
this paper, we perform a comprehensive analysis of the false premise
hallucination and elucidate its internal working mechanism: a small subset of
attention heads (which we designate as false premise heads) disturb the
knowledge extraction process, leading to the occurrence of false premise
hallucination. Based on our analysis, we propose FAITH (False
premise Attention head constraIining for miTigating
Hallucinations), a novel and effective method to mitigate false
premise hallucinations. It constrains the false premise attention heads during
the model inference process. Impressively, extensive experiments demonstrate
that constraining only approximately 1% of the attention heads in the model
yields a notable increase of nearly 20% of model performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要