Swin on Axes: Extending Swin Transformers to Quadtree Image Representations.

IEEE/CVF Winter Conference on Applications of Computer Vision(2024)

引用 0|浏览1
暂无评分
摘要
In recent years, Transformer models have revolutionized machine learning. While this has resulted in impressive re-sults in the field of Natural Language Processing, Computer Vision quickly stumbled upon computation and memory problems due to the high resolution and dimensionality of the input data. This is particularly true for video, where the number of tokens increases cubically relative to the frame and temporal resolutions. A first approach to solve this was Vision Transformers, which introduce a partitioning of the input into embedded grid cells, lowering the effective reso-lution. More recently, Swin Transformers introduced a hi-erarchical scheme that brought the concepts of pooling and locality to transformers in exchange for much lower computational and memory costs. This work proposes a refor-mulation of the latter that views Swin Transformers as reg-ular Transformers applied over a quadtree representation of the input, intrinsically providing a wider range of de-sign choices for the attentional mechanism. Compared to similar approaches such as Swin and MaxViT, our method works on the full range of scales while using a single attentional mechanism, allowing us to simultaneously take into account both dense short range and sparse long range de-pendencies with low computational overhead and without introducing additional sequential operations, thus making full use of GPU parallelism.
更多
查看译文
关键词
Quadtree,Swin Transformer,Computational Cost,Computer Vision,Attention Mechanism,Grid Cells,Transformer Model,Low Computational Cost,Long-range Dependencies,Memory Cost,Vision Transformer,Low Computational Overhead,Image Resolution,Batch Size,Input Image,Image Size,Object Recognition,Receptive Field,Language Model,Archaeological Sites,Local Attention,Attention Matrix,Transformer Block,Dilation Factor,Attention Heads,Global Attention,Attention Block,Input Tokens,Sequence Of Tokens,Attention Regions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要