TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models
arxiv(2023)
摘要
Interpreting the learned features of vision models has posed a longstanding
challenge in the field of machine learning. To address this issue, we propose a
novel method that leverages the capabilities of language models to interpret
the learned features of pre-trained image classifiers. Our method, called
TExplain, tackles this task by training a neural network to establish a
connection between the feature space of image classifiers and language models.
Then, during inference, our approach generates a vast number of sentences to
explain the features learned by the classifier for a given image. These
sentences are then used to extract the most frequent words, providing a
comprehensive understanding of the learned features and patterns within the
classifier. Our method, for the first time, utilizes these frequent words
corresponding to a visual representation to provide insights into the
decision-making process of the independently trained classifier, enabling the
detection of spurious correlations, biases, and a deeper comprehension of its
behavior. To validate the effectiveness of our approach, we conduct experiments
on diverse datasets, including ImageNet-9L and Waterbirds. The results
demonstrate the potential of our method to enhance the interpretability and
robustness of image classifiers.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要