Transformers with Multiresolution Attention Heads

Tan Minh Nguyen,Tho Tran Huu,Tam Minh Nguyen,Minh Pham,Nhat Ho,Stanley Osher

ICLR 2023（2023）

引用 0|浏览25

暂无评分

摘要

We propose the Transformer with Multiresolution-head Attention (MrsFormer), a class of efficient transformers inspired by the multiresolution approximation (MRA) for approximating a signal f using wavelet bases. MRA decomposes a signal into components that lie on orthogonal subspaces at different scales. Similarly, MrsFormer decomposes the attention heads in the multi-head attention into fine-scale and coarse-scale heads, modeling the attention patterns between tokens and between groups of tokens. Computing the attention heads in MrsFormer requires significantly less computation and memory footprint compared to the standard softmax transformer with multi-head attention. We analyze and validate the advantage of MrsFormer over the standard transformers on a wide range of applications including image and time series classification.

查看译文

关键词

transformer,multiresolution analysis,attention heads

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要