Enhancing Efficiency in Sparse Models with Sparser Selection
arxiv(2024)
摘要
Sparse models, including sparse Mixture-of-Experts (MoE) models, have emerged
as an effective approach for scaling Transformer models. However, they often
suffer from computational inefficiency since a significant number of parameters
are unnecessarily involved in computations via multiplying values by zero or
low activation values. To address this issue, we present , a novel MoE
designed to enhance both the efficacy and efficiency of sparse MoE models.
leverages small experts and a threshold-based router to enable tokens to
selectively engage only essential parameters. Our extensive experiments on
language modeling and machine translation tasks demonstrate that can
enhance model performance while decreasing the computation load at MoE layers
by over 50% without sacrificing performance. Furthermore, we present the
versatility of by applying it to dense models, enabling sparse
computation during inference. We provide a comprehensive analysis and make our
code available at https://anonymous.4open.science/r/XMoE.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要