Fairness Without Harm: An Influence-Guided Active Sampling Approach
CoRR(2024)
Abstract
The pursuit of fairness in machine learning (ML), ensuring that the models do
not exhibit biases toward protected demographic groups, typically results in a
compromise scenario. This compromise can be explained by a Pareto frontier
where given certain resources (e.g., data), reducing the fairness violations
often comes at the cost of lowering the model accuracy. In this work, we aim to
train models that mitigate group fairness disparity without causing harm to
model accuracy. Intuitively, acquiring more data is a natural and promising
approach to achieve this goal by reaching a better Pareto frontier of the
fairness-accuracy tradeoff. The current data acquisition methods, such as fair
active learning approaches, typically require annotating sensitive attributes.
However, these sensitive attribute annotations should be protected due to
privacy and safety concerns. In this paper, we propose a tractable active data
sampling algorithm that does not rely on training group annotations, instead
only requiring group annotations on a small validation set. Specifically, the
algorithm first scores each new example by its influence on fairness and
accuracy evaluated on the validation dataset, and then selects a certain number
of examples for training. We theoretically analyze how acquiring more data can
improve fairness without causing harm, and validate the possibility of our
sampling approach in the context of risk disparity. We also provide the upper
bound of generalization error and risk disparity as well as the corresponding
connections. Extensive experiments on real-world data demonstrate the
effectiveness of our proposed algorithm.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined