Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences

Knowledge-Based Systems(2023)

引用 20|浏览30
暂无评分
摘要
Perceiving human emotions from a multimodal perspective has received significant attention in knowledge engineering communities. Due to the variable receiving frequency for sequences from various modalities, multimodal streams usually have an inherent asynchronous challenge. Most previous methods performed manual sequence alignment before multimodal fusion, which ignored long-range dependencies among modalities and failed to learn reliable crossmodal element correlations. Inspired by the human perception paradigm, we propose a target and source Modality Co-Reinforcement (MCR) approach to achieve sufficient crossmodal interaction and fusion at different granularities. Specifically, MCR introduces two types of target modality reinforcement units to reinforce the multimodal representations jointly. These target units effectively enhance emotion-related knowledge exchange in fine-grained interactions and capture the crossmodal elements that are emotionally expressive in mixed-grained interactions. Moreover, a source modality update module is presented to provide meaningful features for the crossmodal fusion of target modalities. Eventually, the multimodal representations are progressively reinforced and improved via the above components. Comprehensive experiments are conducted on three multimodal emotion understanding benchmarks. Quantitative results show that MCR significantly outperforms the previous state-of-the-art methods in both word-aligned and unaligned settings. Additionally, qualitative analysis and visualization fully demonstrate the superiority of the proposed modules.
更多
查看译文
关键词
Emotion understanding,Knowledge exchange,Multimodal fusion,Crossmodal interaction,Modality co-reinforcement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要