Multimodal Cross-Attention Bayesian Network for Social News Emotion Recognition.

Xinzhi Wang,Mengyue Li, Yudong Chang,Xiangfeng Luo, Yige Yao, Zhichao Li

IJCNN(2023)

引用 0|浏览4
暂无评分
摘要
Multimodal emotion recognition comprehensively identifies the emotion contained in multimodal data by bridging the gaps between heterogeneous dataset. In recent years, multimodal emotion recognition methods have gained significant attention and been shown to surpass single-modal approaches. Most of the existing multimodal emotion analysis methods simply combine different modalities to improve the recognition capability of consistent emotion expressions across multimodal data. However, it remains challenging to recognize the right emotion when multiple modalities' contents carry inconsistent or even contradictory emotions. To solve this problem, we propose a novel image-text emotion recognition model named Multimodal Cross-Attention Bayesian Network(MCABN). The entire network exploits Bayesian theory to learn the distribution of its weight parameters, making optimization directions and results of parameters interpretable. What's more, the model leverages the consistency and complementarity between visual content and textual description to arrive at accurate decisions. Specifically, for each modality, multiple explainable features(color, texture, and shape feature in the image, while adjective, adverb, verb, noun, and negative feature in the text) and one unexplainable feature are fused as its feature representation to reinforce emotion-related features. Then two single-modal attention modules(Visual Attention Module and Textual Attention Module) capture the most discriminative features in a single image and text; Two cross-modal attention modules(Image-guided Text Attention Module and Text-guided Image Attention Module) extract the complementary and dominant features between two modalities by interactive learning. Finally, the outputs of four attention modules are integrated through intermediate fusion to predict the final emotion. The experimental results on the NVTD and MVSA-Multiple dataset indicate that the proposed MCABN outperforms state-of-the-art baselines by substantial margins.
更多
查看译文
关键词
Bayesian Neural Network,Multimodal Emotion Recognition,Attention Mechanism,Feature Fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要