Fuser: An enhanced multimodal fusion framework with congruent reinforced perceptron for hateful memes detection

Fan Wu,Bin Gao,Xiaoou Pan, Linlin Li, Yujiao Ma,Shutian Liu,Zhengjun Liu

Information Processing & Management(2024)

引用 0|浏览13
暂无评分
摘要
As a multimodal form of hate speech on social media, hateful memes are more aggressive and cryptic threats to the real life of humans. Automatic detection of hateful memes is crucial, but the images and texts in most memes are only weakly consistent or even irrelevant. Although existing works have achieved the initial goal of detecting hateful memes with pre-trained models, they are limited to monolithic inference methods while ignoring the semantic differences between multimodal representations. To strengthen the comprehension and reasoning of the hidden meaning behind the memes by combining real-world knowledge, we propose an enhanced multimodal fusion framework with congruent reinforced perceptron for hateful memes detection. Inspired by the human cognitive mechanism, we first divide the extracted multisource representations into main semantics and auxiliary contexts based on their strength and relevance, and then precode them into lightly correlated embeddings with unified spatial dimensions via a novel prefix uniform layer, respectively. To jointly learn the intrinsic correlation between primary and secondary semantics, a congruent reinforced perceptron with brain-like perceptual integration is designed to seamlessly fuse multimodal representations in a shared latent space while maintaining the feature integrity in the sub-fusion space, thereby implicitly reasoning about the subtle metaphors behind the memes. Extensive experiments on four benchmark datasets fully demonstrate the effectiveness and superiority of our architecture compared with previous state-of-the-art methods.
更多
查看译文
关键词
Hateful memes detection,Multimodal fusion,Congruent reinforced perceptron,Main semantic,Auxiliary context
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要