Exploring CLIP for Real World, Text-based Image Retrieval.

Manal Sultan, Lia Jacobs,Abby Stylianou,Robert Pless

Applied Imagery Pattern Recognition Workshop（2023）

引用 0|浏览0

暂无评分

摘要

We consider the ability of CLIP features to support text-driven image retrieval. Traditional image-based queries sometimes misalign with user intentions due to their focus on irrelevant image components. To overcome this, we explore the potential of text-based image retrieval, specifically using Contrastive Language-Image Pretraining (CLIP) models. CLIP models, trained on large datasets of image-caption pairs, offer a promising approach by allowing natural language descriptions for more targeted queries. We explore the effectiveness of text-driven image retrieval based on CLIP features by evaluating the image similarity for progressively more detailed queries. We find that there is a sweet-spot of detail in the text that gives best results and find that words describing the "tone" of a scene (such as messy, dingy) are quite important in maximizing text-image similarity.

查看译文

关键词

Deep Learning,Image Retrieval,Human Computer Interaction

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要