Bi-stream graph learning based multimodal fusion for emotion recognition in conversation

Nannan Lu, Zhiyuan Han, Min Han,Jiansheng Qian

INFORMATION FUSION(2024)

引用 0|浏览1
暂无评分
摘要
Emotion Recognition in Conversation (ERC) is the process of automatically detecting and understanding emotions expressed in a conversation, which plays an important role in human-computer interaction. A conversation generates different modality data including words, tone of voice and facial expression. Multimodal ERC can fuse the information from multiple views to comprehensively model emotion dynamics in a conversation. Graph Neural Networks (GNNs) are employed by multimodal ERC to learn intra-modal longrange contextual information and inter -modal interaction. However, fusing different modalities within a graph may generate the conflict of multimodal information and suffer from data heterogeneity issue. In the paper, we propose a novel Bi-stream Graph Learning based Multimodal Fusion (BiGMF) approach for ERC. It consists of a unimodal stream graph learning for modeling the intra-modal long-range context information and a crossmodal stream graph learning for modeling the inter -modal interactions, which uses GNNs to learn the intraand inter -modal information in parallel. The separation learning scheme can successfully alleviate the conflict and heterogeneity in multimodal data fusion, and promote the explicitly modeling of cross -modal relations. The experimental results on two public datasets further verify that the superiority of the proposed approach compared to the SOTA approaches.
更多
查看译文
关键词
Emotion recognition in conversation,Multimodal fusion,Graph neural networks,Contextual information,Inter-modal interaction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要