Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination
CoRR(2024)
摘要
While displaying impressive generation capabilities across many tasks, Large
Language Models (LLMs) still struggle with crucial issues of privacy violation
and unwanted exposure of sensitive data. This raises an essential question: how
should we prevent such undesired behavior of LLMs while maintaining their
strong generation and natural language understanding (NLU) capabilities? In
this work, we introduce a novel approach termed deliberate imagination in the
context of LLM unlearning. Instead of trying to forget memorized data, we
employ a self-distillation framework, guiding LLMs to deliberately imagine
alternative scenarios. As demonstrated in a wide range of experiments, the
proposed method not only effectively unlearns targeted text but also preserves
the LLMs' capabilities in open-ended generation tasks as well as in NLU tasks.
Our results demonstrate the usefulness of this approach across different models
and sizes, and also with parameter-efficient fine-tuning, offering a novel
pathway to addressing the challenges with private and sensitive data in LLM
applications.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要