Bias and Class Imbalance in Oncologic Data-Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets

CANCERS(2022)

引用 20|浏览8
暂无评分
摘要
Simple Summary Large-scale medical data carries significant areas of underrepresentation and bias at all levels: clinical, biological, and management. Resulting data sets and outcome measures reflect these shortcomings in clinical, imaging, and omics data with class imbalance emerging as the single most significant issue inhibiting meaningful and reproducible conclusions while impacting the transfer of findings between the lab and clinic and limiting improvement in patient outcomes. When employing artificial intelligence methods, class imbalance can produce classifiers whose predicted class probabilities are geared toward the majority class ignoring the significance of minority classes, in turn generating algorithmic bias. The inability to mitigate this can guide an AI system in favor of or against various cohorts or variables. We review sources of bias and class imbalance and relate this to AI methods. We discuss avenues to mitigate these and propose a set of guidelines aimed at limiting and addressing data and algorithmic bias. Recent technological developments have led to an increase in the size and types of data in the medical field derived from multiple platforms such as proteomic, genomic, imaging, and clinical data. Many machine learning models have been developed to support precision/personalized medicine initiatives such as computer-aided detection, diagnosis, prognosis, and treatment planning by using large-scale medical data. Bias and class imbalance represent two of the most pressing challenges for machine learning-based problems, particularly in medical (e.g., oncologic) data sets, due to the limitations in patient numbers, cost, privacy, and security of data sharing, and the complexity of generated data. Depending on the data set and the research question, the methods applied to address class imbalance problems can provide more effective, successful, and meaningful results. This review discusses the essential strategies for addressing and mitigating the class imbalance problems for different medical data types in the oncologic domain.
更多
查看译文
关键词
class imbalance, oncology, machine learning, artificial intelligence, clinical data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要