RNIC-A retrospect network for image captioning

SOFT COMPUTING(2022)

引用 0|浏览9
暂无评分
摘要
As cross-domain research combining computer vision and natural language processing, the current image captioning research mainly considers how to improve the visual features; less attention has been paid to utilizing the inherent properties of language to boost captioning performance. Facing this challenge, we proposed a textual attention mechanism, which can obtain semantic relevance between words by scanning all generated words. The retrospect network for image captioning (RNIC) proposed in this paper aims to improve input and prediction process by using textual attention. Concretely, the textual attention mechanism is applied to the model simultaneously with the visual attention mechanism to provide the input of the model with the maximum information required for generating captions. In this way, our model can learn to collaboratively attend on both visual and textual features. Moreover, the semantic relevance between words obtained by retrospect is used as the basis for prediction, so that the decoder can simulate the human language system and better make predictions based on the already generated contents. We evaluate the effectiveness of our model on the COCO image captioning datasets and achieve superior performance over the previous methods.
更多
查看译文
关键词
LSTM, Image caption, Visual attention, Textual attention, Retrospect
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要