Training-Free Pretrained Model Merging
CVPR 2024(2024)
摘要
Recently, model merging techniques have surfaced as a solution to combine
multiple single-talent models into a single multi-talent model. However,
previous endeavors in this field have either necessitated additional training
or fine-tuning processes, or require that the models possess the same
pre-trained initialization. In this work, we identify a common drawback in
prior works w.r.t. the inconsistency of unit similarity in the weight space and
the activation space. To address this inconsistency, we propose an innovative
model merging framework, coined as merging under dual-space constraints
(MuDSC). Specifically, instead of solely maximizing the objective of a single
space, we advocate for the exploration of permutation matrices situated in a
region with a unified high similarity in the dual space, achieved through the
linear combination of activation and weight similarity matrices. In order to
enhance usability, we have also incorporated adaptations for group structure,
including Multi-Head Attention and Group Normalization. Comprehensive
experimental comparisons demonstrate that MuDSC can significantly boost the
performance of merged models with various task combinations and architectures.
Furthermore, the visualization of the merged model within the multi-task loss
landscape reveals that MuDSC enables the merged model to reside in the
overlapping segment, featuring a unified lower loss for each task. Our code is
publicly available at https://github.com/zju-vipa/training_free_model_merging.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要