InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions.

Conference on Empirical Methods in Natural Language Processing(2023)

Cited 0|Views39
No score
Abstract
Debiasing methods in NLP models traditionally focus on isolating information related to a sensitive attribute (like gender or race). We instead argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it. This fair balance is often subjective and can be challenging to achieve algorithmically. We show that an interactive setup with users enabled to provide feedback can achieve a better and fair balance between task performance and bias mitigation, supported by faithful explanations.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined