Revisiting Active Learning in the Era of Vision Foundation Models
CoRR(2024)
Abstract
Foundation vision or vision-language models are trained on large unlabeled or
noisy data and learn robust representations that can achieve impressive zero-
or few-shot performance on diverse tasks. Given these properties, they are a
natural fit for active learning (AL), which aims to maximize labeling
efficiency, but the full potential of foundation models has not been explored
in the context of AL, specifically in the low-budget regime. In this work, we
evaluate how foundation models influence three critical components of effective
AL, namely, 1) initial labeled pool selection, 2) ensuring diverse sampling,
and 3) the trade-off between representative and uncertainty sampling. We
systematically study how the robust representations of foundation models
(DINOv2, OpenCLIP) challenge existing findings in active learning. Our
observations inform the principled construction of a new simple and elegant AL
strategy that balances uncertainty estimated via dropout with sample diversity.
We extensively test our strategy on many challenging image classification
benchmarks, including natural images as well as out-of-domain biomedical images
that are relatively understudied in the AL literature. Source code will be made
available.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined