Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
arxiv๏ผ2024๏ผ
ๆ่ฆ
We introduce an Outlier-Efficient Modern Hopfield Model (termed
๐พ๐๐๐ด๐๐๐ท๐๐) and use it to address the outlier-induced challenge of
quantizing gigantic transformer-based models. Our main contribution is a novel
associative memory model facilitating outlier-efficient associative
memory retrievals. Interestingly, this memory model manifests a model-based
interpretation of an outlier-efficient attention mechanism
(Softmax_1): it is an approximation of the memory retrieval process of
๐พ๐๐๐ด๐๐๐ท๐๐. Methodologically, this allows us to debut novel
outlier-efficient Hopfield layers a powerful attention alternative with
superior post-quantization performance. Theoretically, the Outlier-Efficient
Modern Hopfield Model retains and improves the desirable properties of the
standard modern Hopfield models, including fixed point convergence and
exponential storage capacity. Empirically, we demonstrate the proposed model's
efficacy across large-scale transformer-based and Hopfield-based models
(including BERT, OPT, ViT and STanHop-Net), benchmarking against
state-of-the-art methods including ๐ฒ๐๐๐๐๐๐_๐๐๐๐๐๐๐ก and
๐ถ๐๐๐๐_๐ฐ๐๐๐๐๐๐๐๐. Notably, ๐พ๐๐๐ด๐๐๐ท๐๐ achieves on average
โผ22+% reductions in both average kurtosis and maximum infinity norm of
model outputs accross 4 models.
ๆดๅคๆฅ็่ฏๆ
AI ็่งฃ่ฎบๆ
ๆบฏๆบๆ
ๆ ทไพ
็ๆๆบฏๆบๆ ๏ผ็ ็ฉถ่ฎบๆๅๅฑ่็ป
Chat Paper
ๆญฃๅจ็ๆ่ฎบๆๆ่ฆ