Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark

ICLR 2024(2024)

引用 0|浏览5
暂无评分
摘要
Saliency-based representation visualization (SRV) ($e.g.$, Grad-CAM) is one of the most classical and widely adopted explainable artificial intelligence (XAI) methods for its simplicity and efficiency. It can be used to interpret deep neural networks by locating saliency areas contributing the most to their predictions. However, it is difficult to automatically measure and evaluate the performance of SRV methods due to the lack of ground-truth salience areas of samples. In this paper, we revisit the backdoor-based SRV evaluation, which is currently the only feasible method to alleviate the previous problem. We first reveal its \emph{implementation limitations} and \emph{unreliable nature} due to the trigger generalization of existing backdoor watermarks. Given these findings, we propose a generalization-limited backdoor watermark (GLBW), based on which we design a more faithful XAI evaluation. Specifically, we formulate the training of watermarked DNNs as a min-max problem, where we find the `worst' potential trigger (with the highest attack effectiveness and differences from the ground-truth trigger) via inner maximization and minimize its effects and the loss over benign and poisoned samples via outer minimization in each iteration. In particular, we design an adaptive optimization method to find desired potential triggers in each inner maximization. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our generalization-limited watermark. Our codes are available at \url{https://github.com/yamengxi/GLBW}.
更多
查看译文
关键词
XAI,XAI Evaluation,Backdoor Watermark,Backdoor Attack,Backdoor for Positive Purposes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要