Chrome Extension
WeChat Mini Program
Use on ChatGLM

Context-aware Difference Distilling for Multi-change Captioning

Annual Meeting of the Association for Computational Linguistics(2024)

Cited 0|Views16
No score
Abstract
Multi-change captioning aims to describe complex and coupled changes withinan image pair in natural language. Compared with single-change captioning, thistask requires the model to have higher-level cognition ability to reason anarbitrary number of changes. In this paper, we propose a novel context-awaredifference distilling (CARD) network to capture all genuine changes foryielding sentences. Given an image pair, CARD first decouples context featuresthat aggregate all similar/dissimilar semantics, termed common/differencecontext features. Then, the consistency and independence constraints aredesigned to guarantee the alignment/discrepancy of common/difference contextfeatures. Further, the common context features guide the model to mine locallyunchanged features, which are subtracted from the pair to distill locallydifference features. Next, the difference context features augment the locallydifference features to ensure that all changes are distilled. In this way, weobtain an omni-representation of all changes, which is translated intolinguistic sentences by a transformer decoder. Extensive experiments on threepublic datasets show CARD performs favourably against state-of-the-artmethods.The code is available at https://github.com/tuyunbin/CARD.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined