MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
CoRR(2024)
摘要
Memes have evolved as a prevalent medium for diverse communication, ranging
from humour to propaganda. With the rising popularity of image-focused content,
there is a growing need to explore its potential harm from different aspects.
Previous studies have analyzed memes in closed settings - detecting harm,
applying semantic labels, and offering natural language explanations. To extend
this research, we introduce MemeMQA, a multimodal question-answering framework
aiming to solicit accurate responses to structured questions while providing
coherent explanations. We curate MemeMQACorpus, a new dataset featuring 1,880
questions related to 1,122 memes with corresponding answer-explanation pairs.
We further propose ARSENAL, a novel two-stage multimodal framework that
leverages the reasoning capabilities of LLMs to address MemeMQA. We benchmark
MemeMQA using competitive baselines and demonstrate its superiority - 18
enhanced answer prediction accuracy and distinct text generation lead across
various metrics measuring lexical and semantic alignment over the best
baseline. We analyze ARSENAL's robustness through diversification of
question-set, confounder-based evaluation regarding MemeMQA's generalizability,
and modality-specific assessment, enhancing our understanding of meme
interpretation in the multimodal communication landscape.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要