DBFFT: Adversarial-robust dual-branch frequency domain feature fusion in vision transformers

Information Fusion(2024)

引用 0|浏览0
暂无评分
摘要
Vision transformers (ViTs) have been successful in image recognition. However, it is difficult for ViTs to capture comprehensive information and resist adversarial perturbations by learning features from the spatial domain alone. Features with frequency domain information also play an important role in image classification and robustness improvement. In particular, the relative importance of spatial and frequency domain feature representations should vary depending on the encoding stage. Previous studies lack consideration of the flexible fusion of feature representations from different domains. To address this limitation, we propose a novel dual-branch adaptive frequency domain feature fusion architecture for Transformers with good classification ability and strong adversarial robustness, namely DBFFT. In each layer, we design two parallel Fourier transform and self-attention branches to learn hidden representations from the frequency domain and spatial domain, respectively. These are then adaptively weighted and fused according to their learned importance. Moreover, we further propose a dual-branch patch embedding fusion module. The module introduces different convolutional paths to extract input image features at different scales. The features are then embedded and combined into more informative tokens. Our DBFFT architecture can make full use of diverse domain and scale information, which benefits the image classification and enhances robustness against adversarial interference. Experimental results show that our DBFFT achieves promising performance and robustness in many image classification datasets and robustness benchmarks with favorable accuracy-complexity trade-offs.
更多
查看译文
关键词
Frequency domain,Domain fusion,Scale fusion,Vision transformer,Adversarial learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要