Question-Guided Graph Convolutional Network for Visual Question Answering Based on Object-Difference

Minchang Huangfu,Yushui Geng

2023 IEEE Smart World Congress (SWC)(2023)

引用 0|浏览2
暂无评分
摘要
The diversity of VQA questions bring new challenge for VQA model to predict the answer. Existing models focus on the construction of new attention mechanisms and object recognition, but ignore the understanding of visual and semantic object relationships. To solve this problem, we propose a graph convolutional network based on Fine-grained Question and object-difference (FQOD-GCN). Firstly, the question is presented in a fine-grained way. The fine-grained question features can effectively identify the relationship between objects in the image. Then, under the guidance of fine-grained question features, the differences between visual objects are calculated to learn the semantic relationship appropriate to the question, and the object relationship graph is constructed based on the object differences. In addition, the graph convolutional network is used to focus on the neighborhood information of each object in the object graph, which magnifies the objects related to the question to reduce the redundancy. The performance of our model on VQA 2.0 data set is 3% ~ 4% higher than that of the classical method, which proves the effectiveness of the model.
更多
查看译文
关键词
visual question answering,object difference,fine-grained representation,graph convolutional network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要