Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition
arxiv(2024)
Abstract
With the proliferation of social media posts in recent years, the need to
detect sentiments in multimodal (image-text) content has grown rapidly. Since
posts are user-generated, the image and text from the same post can express
different or even contradictory sentiments, leading to potential
sentiment discrepancy. However, existing works mainly adopt a
single-branch fusion structure that primarily captures the consistent sentiment
between image and text. The ignorance or implicit modeling of discrepant
sentiment results in compromised unimodal encoding and limited performances. In
this paper, we propose a semantics Completion and Decomposition (CoDe) network
to resolve the above issue. In the semantics completion module, we complement
image and text representations with the semantics of the OCR text embedded in
the image, helping bridge the sentiment gap. In the semantics decomposition
module, we decompose image and text representations with exclusive projection
and contrastive learning, thereby explicitly capturing the discrepant sentiment
between modalities. Finally, we fuse image and text representations by
cross-attention and combine them with the learned discrepant sentiment for
final classification. Extensive experiments conducted on four multimodal
sentiment datasets demonstrate the superiority of CoDe against SOTA methods.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined