谷歌浏览器插件
订阅小程序
在清言上使用

An evaluation of synthetic data augmentation for mitigating covariate bias in health data

PATTERNS(2024)

引用 0|浏览14
暂无评分
摘要
Data bias is a major concern in biomedical research, especially when evaluating large-scale observational datasets. It leads to imprecise predictions and inconsistent estimates in standard regression models. We compare the performance of commonly used bias -mitigating approaches (resampling, algorithmic, and post hoc approaches) against a synthetic data -augmentation method that utilizes sequential boosted decision trees to synthesize under -represented groups. The approach is called synthetic minority augmentation (SMA). Through simulations and analysis of real health datasets on a logistic regression workload, the approaches are evaluated across various bias scenarios (types and severity levels). Performance was assessed based on area under the curve, calibration (Brier score), precision of parameter estimates, confidence interval overlap, and fairness. Overall, SMA produces the closest results to the ground truth in low to medium bias (50% or less missing proportion). In high bias (80% or more missing proportion), the advantage of SMA is not obvious, with no specific method consistently outperforming others.
更多
查看译文
关键词
covariate imbalance,data bias,classification,synthetic data generation,fairness,generative model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要