Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models
CoRR(2024)
摘要
The age of social media is flooded with Internet memes, necessitating a clear
grasp and effective identification of harmful ones. This task presents a
significant challenge due to the implicit meaning embedded in memes, which is
not explicitly conveyed through the surface text and image. However, existing
harmful meme detection methods do not present readable explanations that unveil
such implicit meaning to support their detection decisions. In this paper, we
propose an explainable approach to detect harmful memes, achieved through
reasoning over conflicting rationales from both harmless and harmful positions.
Specifically, inspired by the powerful capacity of Large Language Models (LLMs)
on text generation and reasoning, we first elicit multimodal debate between
LLMs to generate the explanations derived from the contradictory arguments.
Then we propose to fine-tune a small language model as the debate judge for
harmfulness inference, to facilitate multimodal fusion between the harmfulness
rationales and the intrinsic multimodal information within memes. In this way,
our model is empowered to perform dialectical reasoning over intricate and
implicit harm-indicative patterns, utilizing multimodal explanations
originating from both harmless and harmful arguments. Extensive experiments on
three public meme datasets demonstrate that our harmful meme detection approach
achieves much better performance than state-of-the-art methods and exhibits a
superior capacity for explaining the meme harmfulness of the model predictions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要