Systematic comparison of semi-supervised and self-supervised learning for medical image classification
arXiv (Cornell University)(2023)
Abstract
In many medical image classification problems, labeled data is scarce while
unlabeled data is more available. Semi-supervised learning and self-supervised
learning are two different research directions that can improve accuracy by
learning from extra unlabeled data. Recent methods from both directions have
reported significant gains on traditional benchmarks. Yet past benchmarks do
not focus on medical tasks and rarely compare self- and semi- methods together
on equal footing. Furthermore, past benchmarks often handle hyperparameter
tuning suboptimally. First, they may not tune hyperparameters at all, leading
to underfitting. Second, when tuning does occur, it often unrealistically uses
a labeled validation set much larger than the train set. Both cases make
previously published rankings of methods difficult to translate to practical
settings. This study contributes a systematic evaluation of self- and semi-
methods with a unified experimental protocol intended to guide a practitioner
with scarce overall labeled data and a limited compute budget. We answer two
key questions: Can hyperparameter tuning be effective with realistic-sized
validation sets? If so, when all methods are tuned well, which self- or
semi-supervised methods reach the best accuracy? Our study compares 13
representative semi- and self-supervised methods to strong labeled-set-only
baselines on 4 medical datasets. From 20000+ total GPU hours of computation, we
provide valuable best practices to resource-constrained, results-focused
practitioners.
MoreTranslated text
Key words
learning,images,time frontiers,semi-supervised,self-supervised
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined