Multi-Head Modularization to Leverage Generalization Capability in Multi-Modal Networks.

AAAI Conference on Artificial Intelligence(2022)

引用 1|浏览24
暂无评分
摘要
It has been crucial to leverage the rich information of multiple modalities in many tasks. Existing works have tried to design multi-modal networks with descent multi-modal fusion modules. Instead, we focus on improving generalization capability of multi-modal networks, especially the fusion module. Viewing the multi-modal data as different projections of information, we first observe that bad projection can cause poor generalization behaviors of multi-modal networks. Then, motivated by well-generalized network's low sensitivity to perturbation, we propose a novel multi-modal training method, multi-head modularization (MHM). We modularize a multi-modal network as a series of uni-modal embedding, multi-modal embedding, and task-specific head modules. Also, for training, we exploit multiple head modules learned with different datasets, swapping each other. From this, we can make the multi-modal embedding module robust to all the heads with different generalization behaviors. In testing phase, we select one of the head modules not to increase the computational cost. Owing to the perturbation of head modules, though including one selected head, the deployed network is more well-generalized compared to the simply end-to-end learned. We verify the effectiveness of MHM on various multi-modal tasks. We use the state-of-the-art methods as baselines, and show notable performance gain for all the baselines.
更多
查看译文
关键词
Machine Learning (ML)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要