Chrome Extension
WeChat Mini Program
Use on ChatGLM

Harder Tasks Need More Experts: Dynamic Routing in MoE Models

Annual Meeting of the Association for Computational Linguistics(2024)

Cited 0|Views38
No score
Abstract
In this paper, we introduce a novel dynamic expert selection framework forMixture of Experts (MoE) models, aiming to enhance computational efficiency andmodel performance by adjusting the number of activated experts based on inputdifficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing,which activates a predetermined number of experts regardless of the input'scomplexity, our method dynamically selects experts based on the confidencelevel in expert selection for each input. This allows for a more efficientutilization of computational resources, activating more experts for complextasks requiring advanced reasoning and fewer for simpler tasks. Throughextensive evaluations, our dynamic routing method demonstrates substantialimprovements over conventional Top-2 routing across various benchmarks,achieving an average improvement of 0.7parameters. Further analysis shows our model dispatches more experts to tasksrequiring complex reasoning skills, like BBH, confirming its ability todynamically allocate computational resources in alignment with the input'scomplexity. Our findings also highlight a variation in the number of expertsneeded across different layers of the transformer model, offering insights intothe potential for designing heterogeneous MoE frameworks. The code and modelsare available at https://github.com/ZhenweiAn/Dynamic_MoE.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined