Large Scale Knowledge Washing
CoRR(2024)
摘要
Large language models show impressive abilities in memorizing world
knowledge, which leads to concerns regarding memorization of private
information, toxic or sensitive knowledge, and copyrighted content. We
introduce the problem of Large Scale Knowledge Washing, focusing on
"unlearning" extensive amounts of factual knowledge. Previous unlearning
methods usually define the reverse loss and update the model via
backpropagation, which may affect the model's fluency and reasoning ability or
even destroy the model due to extensive training with the reverse loss.
Existing works introduce additional data from downstream tasks to prevent the
model from losing capabilities, which requires downstream task awareness.
Controlling the tradeoff of unlearning and maintaining existing capabilities is
also challenging. To this end, we propose LAW (Large Scale Washing) to update
the MLP layers in decoder-only large language models to perform knowledge
washing, as inspired by model editing methods and based on the hypothesis that
knowledge and reasoning are disentanglable. We derive a new objective with the
knowledge to be unlearned to update the weights of certain MLP layers.
Experimental results demonstrate the effectiveness of LAW in forgetting target
knowledge while maintaining reasoning ability. The code will be open-sourced at
https://github.com/wangyu-ustc/LargeScaleWashing.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要