Chrome Extension
WeChat Mini Program
Use on ChatGLM

Co-attention Network for Visual Question Answering Based on Dual Attention

Journal of Engineering Science and Technology Review(2021)

Cited 0|Views2
No score
Abstract
Attention mechanism is a modal feature processing method widely used in visual question answering (VQA) tasks. However, the attention bias may lead to the misalignment of key targets between modalities, which reduces the accuracy of VQA tasks. A co-attention network with dual attention mechanism was proposed to accurately align the key target between image and text modalities. First, the dual attention mechanism was used to accurately localize key targets within the modality. Then, the co-attention was employed for continuous fusion of image and text features. Finally, the key target alignment between modalities was achieved. A large number of experiments verified the validity of this model. Results demonstrate that the dual attention mechanism can accurately locate the target within the modality based on the existing attention. The modal fusion of image-guided text and text-guided image co-attention improves the alignment of key targets between modalities to some extent. Compared with the overall performance of several existing classic VQA models, that of the proposed model is improved by 0.14%–5.69%. This study provides some references for improving the performance of VQA tasks by target alignment between image and text modalities. __________________________________________________________________________________________
More
Translated text
Key words
visual question answering,dual,co-attention
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined