Producing Plankton Classifiers that are Robust to Dataset Shift
CoRR(2024)
摘要
Modern plankton high-throughput monitoring relies on deep learning
classifiers for species recognition in water ecosystems. Despite satisfactory
nominal performances, a significant challenge arises from Dataset Shift, which
causes performances to drop during deployment. In our study, we integrate the
ZooLake dataset with manually-annotated images from 10 independent days of
deployment, serving as test cells to benchmark Out-Of-Dataset (OOD)
performances. Our analysis reveals instances where classifiers, initially
performing well in In-Dataset conditions, encounter notable failures in
practical scenarios. For example, a MobileNet with a 92
shows a 77
OOD performance drops and propose a preemptive assessment method to identify
potential pitfalls when classifying new data, and pinpoint features in OOD
images that adversely impact classification. We present a three-step pipeline:
(i) identifying OOD degradation compared to nominal test performance, (ii)
conducting a diagnostic analysis of degradation causes, and (iii) providing
solutions. We find that ensembles of BEiT vision transformers, with targeted
augmentations addressing OOD robustness, geometric ensembling, and
rotation-based test-time augmentation, constitute the most robust model, which
we call BEsT model. It achieves an 83
on container classes. Moreover, it exhibits lower sensitivity to dataset shift,
and reproduces well the plankton abundances. Our proposed pipeline is
applicable to generic plankton classifiers, contingent on the availability of
suitable test cells. By identifying critical shortcomings and offering
practical procedures to fortify models against dataset shift, our study
contributes to the development of more reliable plankton classification
technologies.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要