Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
CVPR 2024(2024)
摘要
Large vision-language models (VLMs) like CLIP have demonstrated good
zero-shot learning performance in the unsupervised domain adaptation task. Yet,
most transfer approaches for VLMs focus on either the language or visual
branches, overlooking the nuanced interplay between both modalities. In this
work, we introduce a Unified Modality Separation (UniMoS) framework for
unsupervised domain adaptation. Leveraging insights from modality gap studies,
we craft a nimble modality separation network that distinctly disentangles
CLIP's features into language-associated and vision-associated components. Our
proposed Modality-Ensemble Training (MET) method fosters the exchange of
modality-agnostic information while maintaining modality-specific nuances. We
align features across domains using a modality discriminator. Comprehensive
evaluations on three benchmarks reveal our approach sets a new state-of-the-art
with minimal computational costs. Code: https://github.com/TL-UESTC/UniMoS
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要