Video multimodal sentiment analysis using cross-modal feature translation and dynamical propagation

Knowledge-Based Systems(2024)

Cited 0|Views7
No score
Abstract
Multimodal sentiment analysis on social platforms is crucial for comprehending public opinions and attitudes, thus garnering substantial interest in knowledge engineering. Existing methods like implicit interaction, explicit interaction, and cross-modal translation can effectively integrate sentiment information, but they encounter challenges in establishing efficient emotional correlations across modalities due to data heterogeneity and concealed emotional relationships. To tackle this issue, we propose a video multimodal sentiment analysis model called PEST, which leverages cross-modal feature translation and a dynamic propagation model. Specifically, cross-modal feature translation translates textual, visual, and acoustic features into a common feature space, eliminating heterogeneity and enabling initial modal interaction. Additionally, the dynamic propagation model facilitates in-depth interaction and aids in establishing stable and reliable emotional correlations across modalities. Extensive experiments on the three multimodal sentiment datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS, demonstrate that PEST exhibits superior performance in both word-aligned and unaligned settings.
More
Translated text
Key words
Video multimodal sentiment analysis,Public emotion feature,Cross-modal feature translation,Dynamical propagation model
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined