Higher Layers Need More LoRA Experts
CoRR(2024)
摘要
Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA)
offer training efficiency on Large Language Models, but their impact on model
performance remains limited. Recent efforts integrate LoRA and
Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite
promising results, research on improving the efficiency of LoRA with MoE is
still in its early stages. Recent studies have shown that experts in the MoE
architecture have different strengths and also exhibit some redundancy. Does
this statement also apply to parameter-efficient MoE? In this paper, we
introduce a novel parameter-efficient MoE method,
MoE-LoRA with Layer-wise Expert
Allocation (MoLA) for Transformer-based models, where each model
layer has the flexibility to employ a varying number of LoRA experts. We
investigate several architectures with varying layer-wise expert
configurations. Experiments on six well-known NLP and commonsense QA benchmarks
demonstrate that MoLA achieves equal or superior performance compared to all
baselines. We find that allocating more LoRA experts to higher layers further
enhances the effectiveness of models with a certain number of experts in total.
With much fewer parameters, this allocation strategy outperforms the setting
with the same number of experts in every layer. This work can be widely used as
a plug-and-play parameter-efficient tuning approach for various applications.
The code is available at https://github.com/GCYZSL/MoLA.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要