Exploring CLIP for Real World, Text-based Image Retrieval.

Manal Sultan, Lia Jacobs,Abby Stylianou,Robert Pless

Applied Imagery Pattern Recognition Workshop(2023)

引用 0|浏览0
暂无评分
摘要
We consider the ability of CLIP features to support text-driven image retrieval. Traditional image-based queries sometimes misalign with user intentions due to their focus on irrelevant image components. To overcome this, we explore the potential of text-based image retrieval, specifically using Contrastive Language-Image Pretraining (CLIP) models. CLIP models, trained on large datasets of image-caption pairs, offer a promising approach by allowing natural language descriptions for more targeted queries. We explore the effectiveness of text-driven image retrieval based on CLIP features by evaluating the image similarity for progressively more detailed queries. We find that there is a sweet-spot of detail in the text that gives best results and find that words describing the "tone" of a scene (such as messy, dingy) are quite important in maximizing text-image similarity.
更多
查看译文
关键词
Deep Learning,Image Retrieval,Human Computer Interaction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要