Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection
CoRR(2024)
摘要
Image Anomaly Detection has been a challenging task in Computer Vision field.
The advent of Vision-Language models, particularly the rise of CLIP-based
frameworks, has opened new avenues for zero-shot anomaly detection. Recent
studies have explored the use of CLIP by aligning images with normal and prompt
descriptions. However, the exclusive dependence on textual guidance often falls
short, highlighting the critical importance of additional visual references. In
this work, we introduce a Dual-Image Enhanced CLIP approach, leveraging a joint
vision-language scoring system. Our methods process pairs of images, utilizing
each as a visual reference for the other, thereby enriching the inference
process with visual context. This dual-image strategy markedly enhanced both
anomaly classification and localization performances. Furthermore, we have
strengthened our model with a test-time adaptation module that incorporates
synthesized anomalies to refine localization capabilities. Our approach
significantly exploits the potential of vision-language joint anomaly detection
and demonstrates comparable performance with current SOTA methods across
various datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要