Large Language Model Bias Mitigation from the Perspective of Knowledge Editing
CoRR(2024)
摘要
Existing debiasing methods inevitably make unreasonable or undesired
predictions as they are designated and evaluated to achieve parity across
different social groups but leave aside individual facts, resulting in modified
existing knowledge. In this paper, we first establish a new bias mitigation
benchmark BiasKE leveraging existing and additional constructed datasets, which
systematically assesses debiasing performance by complementary metrics on
fairness, specificity, and generalization. Meanwhile, we propose a novel
debiasing method, Fairness Stamp (FAST), which enables editable fairness
through fine-grained calibration on individual biased knowledge. Comprehensive
experiments demonstrate that FAST surpasses state-of-the-art baselines with
remarkable debiasing performance while not hampering overall model capability
for knowledge preservation, highlighting the prospect of fine-grained debiasing
strategies for editable fairness in LLMs.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要