Identifying Imbalance Thresholds in Input Data to Achieve Desired Levels of Algorithmic Fairness.

Big Data(2022)

引用 0|浏览9
暂无评分
摘要
Software bias has emerged as a relevant issue in the latest years, in conjunction with the increasing adoption of software automation in a variety of organizational and production processes of our society, and especially in decision-making. Among the causes of software bias, data imbalance is one of the most significant issues. In this paper, we treat imbalance in datasets as a risk factor for software bias. Specifically, we define a methodology to identify thresholds for balance measures as meaningful risk indicators of unfair classification output. We apply the methodology to a large number of data mutations with different classification tasks and tested all possible combinations of balance-unfairness-algorithm.The results show that on average the thresholds can accurately identify the risk of unfair output. In certain cases they even tend to overestimate the risk: although such behavior could be instrumental to a prudential approach towards software discrimination, further work will be devoted to better assess the reliability of the thresholds.The proposed methodology is generic and it can be applied to different datasets, algorithms, and context-specific thresholds.
更多
查看译文
关键词
Data bias,Data imbalance,Algorithmic fairness,Risk analysis,Automated decision-making
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要