Spatial-Enhanced Multi-Level Wavelet Patching in Vision Transformers

IEEE SIGNAL PROCESSING LETTERS(2024)

引用 0|浏览11
暂无评分
摘要
By seamlessly integrating wavelet transforms into the image patching stage of ViT, we leverage the power of multi-level wavelet transforms to decompose images into a diverse array of frequency-domain features. These features, integrated with spatial characteristics at equivalent scales, enrich image details, enhancing ViT's proficiency in delineating intricate textures and distinct edges. Consequently, we registered a notable 2.7% accuracy enhancement on the ImageNet100 dataset in ViT. Our wavelet patching module, designed for versatility, seamlessly fits into various ViT derivatives without necessitating architecture modifications. This advancement has uplifted the performance of several leading vision transformers by 0.46-4.3%, preserving parameter efficiency without notable FLOPs increment.
更多
查看译文
关键词
Wavelet domain,Transformers,Frequency-domain analysis,Discrete wavelet transforms,Convolution,Standards,Image color analysis,Vision transformer,image patching,wavelet transform,low-level feature
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要