Enhance understanding and reasoning ability for image captioning

Applied Intelligence(2022)

引用 4|浏览13
暂无评分
摘要
Image captioning aims to generate a grammatically correct and semantically accurate natural language description of a given image. To better capture the complex information contained in an image and expand the relevant external knowledge outside the image to generate a better image caption, this paper proposes an end-to-end image captioning framework named Enhance Understanding and Reasoning Ability for Image Captioning (EURAIC) based on the Transformer model. EURAIC provides an enhanced visual understanding ability and caption reasoning ability to improve image captioning performance. To achieve this goal, we use the semantic features of the core objects detected from the image to guide the visual features, which incorporate information on the spatial position relationships between the objects. Then, we introduce an external knowledge network to obtain information other than the inherent content of the image. In this way, a high-quality image caption sentence can be generated for the given image. Experiments on the MSCOCO dataset prove that our method is superior to the baseline model and comparable to other state-of-the-art methods.
更多
查看译文
关键词
Image captioning, Transformer, Spatial position relationship, Semantic feature, External knowledge
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要