ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
CoRR(2024)
摘要
Activation sparsity refers to the existence of considerable
weakly-contributed elements among activation outputs. As a prevalent property
of the models using the ReLU activation function, it has been proven a
promising paradigm to boost model inference efficiency. Nevertheless, most
large language models (LLMs) adopt activation functions without intrinsic
activation sparsity (e.g., GELU and Swish). Some recent efforts have explored
introducing ReLU or its variants as the substitutive activation function to
help LLMs achieve activation sparsity and inference acceleration, but few can
simultaneously obtain high sparsity and comparable model performance. This
paper introduces an effective sparsification method named "ProSparse" to push
LLMs for higher activation sparsity without decreasing model performance.
Specifically, after substituting the activation function of LLMs with ReLU,
ProSparse adopts progressive sparsity regularization with a factor smoothly
increasing along sine curves in multiple stages. This can enhance activation
sparsity and alleviate performance degradation by avoiding radical shifts in
activation distribution. With ProSparse, we obtain high sparsity of 89.32
88.80
performance to their original Swish-activated versions. Our inference
acceleration experiments further demonstrate the practical acceleration brought
by higher activation sparsity.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要