WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
CoRR(2024)
摘要
Recent advancements in neural networks have showcased their remarkable
capabilities across various domains. Despite these successes, the "black box"
problem still remains. Addressing this, we propose a novel framework, WWW, that
offers the 'what', 'where', and 'why' of the neural network decisions in
human-understandable terms. Specifically, WWW utilizes adaptive selection for
concept discovery, employing adaptive cosine similarity and thresholding
techniques to effectively explain 'what'. To address the 'where' and 'why', we
proposed a novel combination of neuron activation maps (NAMs) with Shapley
values, generating localized concept maps and heatmaps for individual inputs.
Furthermore, WWW introduces a method for predicting uncertainty, leveraging
heatmap similarities to estimate 'how' reliable the prediction is. Experimental
evaluations of WWW demonstrate superior performance in both quantitative and
qualitative metrics, outperforming existing methods in interpretability. WWW
provides a unified solution for explaining 'what', 'where', and 'why',
introducing a method for localized explanations from global interpretations and
offering a plug-and-play solution adaptable to various architectures.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要