Mixture of Experts Using Tensor Products
CoRR(2024)
Abstract
In multi-task learning, the conventional approach involves training a model
on multiple tasks simultaneously. However, the training signals from different
tasks can interfere with one another, potentially leading to negative
transfer. To mitigate this, we investigate if modular language models can
facilitate positive transfer and systematic generalization. Specifically, we
propose a novel modular language model (), that balances
parameter efficiency with nuanced routing methods. For modules, we
reparameterize Low-Rank Adaptation () by employing an entangled
tensor through the use of tensor product operations and name the resulting
approach . For routing function, we tailor two
innovative routing functions according to the granularity:
which directs to each rank within the entangled tensor
while offers a finer-grained routing approach targeting
each order of the entangled tensor. The experimental results from the
multi-task T0-benchmark demonstrate that: 1) all modular LMs surpass the
corresponding dense approaches, highlighting the potential of modular language
models to mitigate negative inference in multi-task learning and deliver
superior outcomes. 2) achieves higher parameter
efficiency in adaptation and outperforms other modular LMs, which shows the
potential of our approach in multi-task transfer learning.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined