MoPE: Parameter-Efficient and Scalable Multimodal Fusion via Mixture of Prompt Experts
arxiv(2024)
摘要
Prompt-tuning has demonstrated parameter-efficiency in fusing unimodal
foundation models for multimodal tasks. However, its limited adaptivity and
expressiveness lead to suboptimal performance when compared with other tuning
methods. In this paper, we address this issue by disentangling the vanilla
prompts to adaptively capture dataset-level and instance-level features.
Building upon this disentanglement, we introduce the mixture of prompt experts
(MoPE) technique to enhance expressiveness. MoPE leverages multimodal pairing
priors to route the most effective prompt on a per-instance basis. Compared to
vanilla prompting, our MoPE-based conditional prompting exhibits greater
expressiveness for multimodal fusion, scaling better with the training data and
the overall number of trainable parameters. We also study a regularization term
for expert routing, leading to emergent expert specialization, where different
experts focus on different concepts, enabling interpretable soft prompting.
Extensive experiments across three multimodal datasets demonstrate that our
method achieves state-of-the-art results, matching or even surpassing the
performance of fine-tuning, while requiring only 0.8
parameters. Code will be released: https://github.com/songrise/MoPE.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要