INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection
CoRR(2024)
摘要
Knowledge hallucination have raised widespread concerns for the security and
reliability of deployed LLMs. Previous efforts in detecting hallucinations have
been employed at logit-level uncertainty estimation or language-level
self-consistency evaluation, where the semantic information is inevitably lost
during the token-decoding procedure. Thus, we propose to explore the dense
semantic information retained within LLMs' INternal States
for hallucInation DEtection (INSIDE). In particular,
a simple yet effective EigenScore metric is proposed to better
evaluate responses' self-consistency, which exploits the eigenvalues of
responses' covariance matrix to measure the semantic consistency/diversity in
the dense embedding space. Furthermore, from the perspective of self-consistent
hallucination detection, a test time feature clipping approach is explored to
truncate extreme activations in the internal states, which reduces
overconfident generations and potentially benefits the detection of
overconfident hallucinations. Extensive experiments and ablation studies are
performed on several popular LLMs and question-answering (QA) benchmarks,
showing the effectiveness of our proposal.
更多查看译文
关键词
Large Language Models,Hallucination Detection,LogDet of Covariance Matrix,Eigenvalues,Feature Clipping
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要