Multi-Image Visual Question Answering for Unsupervised Anomaly Detection
arxiv(2024)
摘要
Unsupervised anomaly detection enables the identification of potential
pathological areas by juxtaposing original images with their pseudo-healthy
reconstructions generated by models trained exclusively on normal images.
However, the clinical interpretation of resultant anomaly maps presents a
challenge due to a lack of detailed, understandable explanations. Recent
advancements in language models have shown the capability of mimicking
human-like understanding and providing detailed descriptions. This raises an
interesting question: How can language models be employed to make the
anomaly maps more explainable? To the best of our knowledge, we are the first
to leverage a language model for unsupervised anomaly detection, for which we
construct a dataset with different questions and answers. Additionally, we
present a novel multi-image visual question answering framework tailored for
anomaly detection, incorporating diverse feature fusion strategies to enhance
visual knowledge extraction. Our experiments reveal that the framework,
augmented by our new Knowledge Q-Former module, adeptly answers questions on
the anomaly detection dataset. Besides, integrating anomaly maps as inputs
distinctly aids in improving the detection of unseen pathologies.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要