Image paragraph captioning with topic clustering and topic shift prediction

KNOWLEDGE-BASED SYSTEMS(2024)

引用 0|浏览8
暂无评分
摘要
Image paragraph captioning involves generating a semantically coherent paragraph describing an image's visual content. The selection and shifting of sentence topics are critical when a human describes an image. However, previous hierarchical image paragraph captioning methods have not fully explored or utilized sentence topics. In particular, the continuous and implicit modeling of topics in these methods makes difficult to supervise the topic prediction process explicitly. We propose a new method called topic clustering and topic shift prediction (TCTSP) to solve this problem. Topic clustering (TC) in the sentence embedding space generates semantically explicit and discrete topic labels that can be directly used to supervise prediction. By introducing a topic shift probability matrix that characterizes human topic shift patterns, shift prediction (TSP) predicts subsequent topics that are both logical and consistent with human habits on visual features and language context. TCTSP can be combined with various image paragraph captioning model structures to improve performance. Extensive experiments were conducted on the Stanford image paragraph dataset, and superior results were reported compared with previous state-of-the-art approaches. particular, TCTSP improved the consensus -based image description evaluation (CIDEr) performance of image paragraph captioning to 41.67%. The codes are available at https://github.com/tt0059/TCTSP.
更多
查看译文
关键词
Image paragraph captioning,Topic clustering,Topic shift prediction,Hierarchical supervision
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要