Self-Attention Based Visual Dialogue

Vaibhav Mathur*, ,Divyansh Jha, Sunil Kumar, ,

International Journal of Recent Technology and Engineering(2019)

引用 0|浏览0
暂无评分
摘要
We improvised the performance on the task of Visual Dialogue. We integrate a novel mechanism called self-attention to improve the results reported in the original Visual Dialogue paper. Visual Dialogue is different from other downstream tasks and serves as a universal test of machine intelligence. The model has to be adroit in both vision and language enough to allow assessment of individual answers and observe the development. The dataset used in this paper is VisDial v0.9 which is collected by Georgia Tech University. We used the same train/test splits as the original paper to estimate the result. It contains a total of approximately 1.2 million dialogue question-answer pair which has ten question-answer pairs on ~120,000 images from COCO.To keep the comparison fair and simple, we have used the encoder-decoder architecture namely Late-Fusion Encoder and Discriminative decoder. We included the self-attention module from SAGAN paper into the encoder. The inclusion self-attention module was based in the fact that a lot of answers from visual dialog model were solely based on the questions asked and not on the image. So, the hypothesis is that the self-attention module will make the model attend to the image while generating an answer.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要