Harder Tasks Need More Experts: Dynamic Routing in MoE Models

Quzhe Huang,Zhenwei An, Nan Zhuang,Mingxu Tao,Chen Zhang,Yang Jin,Kun Xu,Liwei Chen,Songfang Huang,Yansong Feng

Annual Meeting of the Association for Computational Linguistics（2024）

Cited 0|Views38

No score

Abstract

In this paper, we introduce a novel dynamic expert selection framework forMixture of Experts (MoE) models, aiming to enhance computational efficiency andmodel performance by adjusting the number of activated experts based on inputdifficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing,which activates a predetermined number of experts regardless of the input'scomplexity, our method dynamically selects experts based on the confidencelevel in expert selection for each input. This allows for a more efficientutilization of computational resources, activating more experts for complextasks requiring advanced reasoning and fewer for simpler tasks. Throughextensive evaluations, our dynamic routing method demonstrates substantialimprovements over conventional Top-2 routing across various benchmarks,achieving an average improvement of 0.7parameters. Further analysis shows our model dispatches more experts to tasksrequiring complex reasoning skills, like BBH, confirming its ability todynamically allocate computational resources in alignment with the input'scomplexity. Our findings also highlight a variation in the number of expertsneeded across different layers of the transformer model, offering insights intothe potential for designing heterogeneous MoE frameworks. The code and modelsare available at https://github.com/ZhenweiAn/Dynamic_MoE.

Translated text

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined