IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
arxiv(2024)
摘要
It is widely acknowledged that large language models (LLMs) encode a vast
reservoir of knowledge after being trained on mass data. Recent studies
disclose knowledge conflicts in LLM generation, wherein outdated or incorrect
parametric knowledge (i.e., encoded knowledge) contradicts new knowledge
provided in the context. To mitigate such knowledge conflicts, we propose a
novel framework, IRCAN (Identifying and Reweighting Context-Aware Neurons) to
capitalize on neurons that are crucial in processing contextual cues.
Specifically, IRCAN first identifies neurons that significantly contribute to
context processing, utilizing a context-aware attribution score derived from
integrated gradients. Subsequently, the identified context-aware neurons are
strengthened via reweighting. In doing so, we steer LLMs to generate
context-sensitive outputs with respect to the new knowledge provided in the
context. Extensive experiments conducted across a variety of models and tasks
demonstrate that IRCAN not only achieves remarkable improvements in handling
knowledge conflicts but also offers a scalable, plug-andplay solution that can
be integrated seamlessly with existing models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要