Grad-Cam Aware Supervised Attention for Visual Question Answering for Post-Disaster Damage Assessment.

ICIP(2022)

引用 1|浏览3
暂无评分
摘要
In this paper, we present a Grad-Cam aware supervised attention framework for visual question answering (VQA) tasks for post-disaster damage assessment purposes. Visual-attention in visual question-answering tasks aims to focus on relevant image regions according to questions to predict answers. However, the conventional attention mechanisms in VQA work in an unsupervised manner, learning to give importance to visual contents by minimizing only task-specific loss. This approach fails to provide appropriate visual attention where the visual contents are very complex. The content and nature of UAV images in FlooNet-VQA dataset are very complex as they depict the hazardous scenario after Hurricane Harvey from a high altitude. To tackle this, we propose a supervised attention mechanism that uses explainable features from Grad-Cam to supervise visual attention in the VQA pipeline. The mechanism we propose operates in two stages. In the first stage of learning, we derived the visual explanations through Grad-Cam by training a baseline attention-based VQA model. In the second stage, we supervise our visual content for each question by incorporating the Grad-Cam explanations from the previous phase of the training process. We have improved the model performance over the state-of-the-art VQA models by a considerable margin on FloodNet dataset.
更多
查看译文
关键词
attention,visual question,assessment,damage,grad-cam,post-disaster
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要