NTD: Non-Transferability Enabled Deep Learning Backdoor Detection

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY(2024)

引用 3|浏览32
暂无评分
摘要
To mitigate recent insidious backdoor attacks on deep learning models, advances have been made by the research community. Nonetheless, state-of-the-art defenses are either limited to specific backdoor attacks (i.e., source-agnostic attacks) or non-user-friendly in that machine learning expertise and/or expensive computing resources are required. This work observes that all existing backdoor attacks have an inadvertent and inevitable intrinsic weakness, termed as non-transferability -that is, a trigger input hijacks a backdoored model but is not effective in another model that has not been implanted with the same backdoor. With this key observation, we propose non-transferability enabled backdoor detection to identify trigger inputs for a model-under-test during run-time. Specifically, our detection allows a potentially backdoored model-under-test to predict a label for an input. Moreover, our detection leverages a feature extractor to extract feature vectors for the input and a group of samples randomly picked from its predicted class label, and then compares the similarity between the input and the samples in the feature extractor's latent space to determine whether the input is a trigger input or a benign one. The feature extractor can be provided by a reputable party or is a free pre-trained model privately reserved from any open platform (e.g., ModelZoo, GitHub, Kaggle) by a user and thus our detection does not require the user to have any machine learning expertise or perform costly computations. Extensive experimental evaluations on four common tasks affirm that our detection scheme has high effectiveness (low false acceptance rate) and usability (low false rejection rate) with low detection latency against different types of backdoor attacks.
更多
查看译文
关键词
Backdoor countermeasure,NTD,non-transferability,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要