J-MAE: Jigsaw Meets Masked Autoencoders in X-Ray Security Inspection

Weichen Xu,Jian Cao,Tianhao Fu, Awen Bai, Ruilong Ren, Zicong Hu,Xixin Cao,Xing Zhang

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

Cited 0|Views5
No score
Abstract
The X-ray security inspection aims to identify any restricted items to protect public safety. Due to the lack of focus on unsupervised learning in this field, using pre-trained models on natural images leads to suboptimal results in downstream tasks. Previous works would lose the relative positional relationships during the pre-training process, which is detrimental for X-ray images that lack texture and rely on shape. In this paper, we propose the jigsaw style MAE (J-MAE) to preserve the relative position information by shuffling the position encoding of visible patches. This forces the network to perform semantic reasoning to understand the shape and composition of X-ray objects. Meanwhile, we propose the Incremental Shuffling Module (ISM) and Permute Predicting Module (PPM) to make the training process more stable and accelerate convergence. Our proposed method has consistently outperformed other methods on three downstream X-ray security inspection datasets.
More
Translated text
Key words
X-ray security inspection,Unsupervised learning,Masked image modeling,Jigsaw puzzles
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined