Explaining Explanations: Axiomatic Feature Interactions For Deep Networks

JOURNAL OF MACHINE LEARNING RESEARCH(2021)

引用 121|浏览77
暂无评分
摘要
Recent work has shown great promise in explaining neural network behavior. In particular, feature attribution methods explain the features that are important to a model's prediction on a given input. However, for many tasks, simply identifying significant features may be insufficient for understanding model behavior. The interactions between features within the model may better explain not only the model, but why certain features outrank others in importance. In this work, we present Integrated Hessians, an extension of Integrated Gradients (Sundararajan et al., 2017) that explains pairwise feature interactions in neural networks. Integrated Hessians overcomes several theoretical limitations of previous methods, and unlike them, is not limited to a specific architecture or class of neural network. Additionally, we find that our method is faster than existing methods when the number of features is large, and outperforms previous methods on existing quantitative benchmarks.
更多
查看译文
关键词
Feature attribution, feature interaction, Aumann-Shapley value, interpretability, neural networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要