InfMAE: A Foundation Model in Infrared Modality
CoRR(2024)
Abstract
In recent years, the foundation models have swept the computer vision field
and facilitated the development of various tasks within different modalities.
However, it remains an open question on how to design an infrared foundation
model. In this paper, we propose InfMAE, a foundation model in infrared
modality. We release an infrared dataset, called Inf30 to address the problem
of lacking large-scale data for self-supervised learning in the infrared vision
community. Besides, we design an information-aware masking strategy, which is
suitable for infrared images. This masking strategy allows for a greater
emphasis on the regions with richer information in infrared images during the
self-supervised learning process, which is conducive to learning the
generalized representation. In addition, we adopt a multi-scale encoder to
enhance the performance of the pre-trained encoders in downstream tasks.
Finally, based on the fact that infrared images do not have a lot of details
and texture information, we design an infrared decoder module, which further
improves the performance of downstream tasks. Extensive experiments show that
our proposed method InfMAE outperforms other supervised methods and
self-supervised learning methods in three downstream tasks. Our code will be
made public at https://github.com/liufangcen/InfMAE.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined