Use of Response Permutation to Measure an Imaging Dataset’s Susceptibility to Overfitting by Selected Standard Analysis Pipelines

Academic Radiology(2024)

Cited 0|Views17
No score
Abstract
Rationale and Objectives This study demonstrates a method for quantifying the impact of overfitting on the receiving operator characteristic curve (AUC) when using standard analysis pipelines to develop imaging biomarkers. We illustrate the approach using two publicly available repositories of radiology and pathology images for breast cancer diagnosis. Materials and Methods For each dataset, we permuted the outcome (cancer diagnosis) values to eliminate any true association between imaging features and outcome. Seven types of classification models (logistic regression, linear discriminant analysis, Naïve Bayes, linear support vector machines, nonlinear support vector machine, random forest, and multi-layer perceptron) were fitted to each scrambled dataset and evaluated by each of four techniques (all data, hold-out, 10-fold cross-validation, and bootstrapping). After repeating this process for a total of 50 outcome permutations, we averaged the resulting AUCs. Any increase over a null AUC of 0.5 can be attributed to overfitting. Results Applying this approach and varying sample size and the number of imaging features, we found that failing to control for overfitting could result in near-perfect prediction (AUC near 1.0). Cross-validation offered greater protection against overfitting than the other evaluation techniques, and for most classification algorithms a sample size of at least 200 was required to assess as few as 10 features with less than 0.05 AUC inflation attributable to overfitting. Conclusion This approach could be applied to any curated dataset to suggest the number of features and analysis approaches to limit overfitting.
More
Translated text
Key words
Machine learning,Bias,AUC,Overfitting,Radiomics,Classifier performance
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined