Combating Multi-level Adversarial Text with Pruning based Adversarial Training

IEEE International Joint Conference on Neural Network (IJCNN)(2022)

引用 0|浏览5
暂无评分
摘要
Despite significant advancements of deep learning-based models for natural language processing (NLP) tasks, previous efforts have shown that numerous models, including deep neural networks (DNNs), suffer from moderate to significant performance degradation with adversarial examples. Adversary crafts malicious text by adding, deleting, modifying chars, words, and sentences, to fool the DNN models. Therefore, adversarial training and model enhanced methods are proposed to combat the adversarial attack. However, both methods are lack generalization due to the overfitting intrinsic of neural networks. In this paper, we propose a novel framework to combat text adversarial examples, namely DisPAT, which consists an adversarial text discriminator and a robust pruned text classifier. First, we explore the adversarial examples and benign examples distribution in embedding space, indicating the feasibility of a DNN-based discriminator. To get multi-level adversarial texts, we deploy a generator, and a discriminator to identify adversarial perturbations. Notably, in the inference stage, our pipeline places the well-trained discriminator in front of the text classifier to distinguish the char-level adversarial text. Finally, we apply neuron-salience-based pruning to specifically improve the classifier performance of adversarial text. Experimental results show that our approach outperforms state-of-the-art baselines in combating both char-level and word-level adversarial text. Moreover, DisPAT achieves a very close to or even higher accuracy than that of the standard model.
更多
查看译文
关键词
Adversarial training,Adversarial text,Model pruning,Deep neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要