Rotated and Masked Image Modeling: A Superior Self-Supervised Method for Classification.

Daisong Yan,Xun Gong ,Zhemin Zhang

IEEE Signal Process. Lett.(2023)

Cited 0|Views0
No score
Abstract
Mask image modeling (MIM) has performed excellently as a transformer-based self-supervised method via random masking and reconstruction. However, since the unmasked image patches are non-participation in the loss computation, MIM cannot effectively utilize the data and waste much computation. This drawback usually limits the learning ability of the pre-training model when pre-training on small-scale datasets. To solve this problem, we propose a novel self-supervised learning method for small-scale datasets called RotMIM. Unlike MIM, RotMIM has a different pretext task: recognizing the rotation angle that is applied to the unmasked patches. RotMIM can fully utilize data and provide a stronger self-supervised signal. Moreover, to fit RotMIM, we propose a data augmentation method called FeaMix. Our proposal ensures that the mixing area with RotMIM understands that each basic unit of semantic information in an image has the same size. This consistency guarantees clean tokenization during fine-tuning after pre-training. Our proposals outperform state-of-the-art self-supervised methods on three popular datasets, Mini-ImageNet, Caltech256, and Cifar100.
More
Translated text
Key words
Self-supervised learning, mask image modeling, small-size datasets, visual transformers
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined