The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse
CoRR(2024)
Abstract
Although model editing has shown promise in revising knowledge in Large
Language Models (LLMs), its impact on the inherent capabilities of LLMs is
often overlooked. In this work, we reveal a critical phenomenon: even a single
edit can trigger model collapse, manifesting as significant performance
degradation in various benchmark tasks. However, benchmarking LLMs after each
edit, while necessary to prevent such collapses, is impractically
time-consuming and resource-intensive. To mitigate this, we propose using
perplexity as a surrogate metric, validated by extensive experiments
demonstrating its strong correlation with downstream task performance. We
further conduct an in-depth study on sequential editing, a practical setting
for real-world scenarios, across various editing methods and LLMs, focusing on
hard cases from our previous single edit studies. The results indicate that
nearly all examined editing methods result in model collapse after only few
edits. To facilitate further research, we have utilized ChatGPT to develop a
new dataset, HardCF, based on those hard cases. This dataset aims to establish
the foundation for pioneering research in reliable model editing and the
mechanisms underlying editing-induced model collapse. We hope this work can
draw the community's attention to the potential risks inherent in model editing
practices.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined