Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
arxiv(2024)
摘要
The Composed Image Retrieval (CIR) task aims to retrieve target images using
a composed query consisting of a reference image and a modified text. Advanced
methods often utilize contrastive learning as the optimization objective, which
benefits from adequate positive and negative examples. However, the triplet for
CIR incurs high manual annotation costs, resulting in limited positive
examples. Furthermore, existing methods commonly use in-batch negative
sampling, which reduces the negative number available for the model. To address
the problem of lack of positives, we propose a data generation method by
leveraging a multi-modal large language model to construct triplets for CIR. To
introduce more negatives during fine-tuning, we design a two-stage fine-tuning
framework for CIR, whose second stage introduces plenty of static
representations of negatives to optimize the representation space rapidly. The
above two improvements can be effectively stacked and designed to be
plug-and-play, easily applied to existing CIR models without changing their
original architectures. Extensive experiments and ablation analysis demonstrate
that our method effectively scales positives and negatives and achieves
state-of-the-art results on both FashionIQ and CIRR datasets. In addition, our
methods also perform well in zero-shot composed image retrieval, providing a
new CIR solution for the low-resources scenario.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要