Adversarial Examples for Chinese Text Classification

2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)(2020)

引用 1|浏览39
暂无评分
摘要
Deep neural networks (DNNs) have been widely adopted in various areas such as image recognition and natural language processing. However, many works show that DNNs for image classification are vulnerable to adversarial examples, which are generated by adding small-magnitude perturbations to the original inputs. In this paper, we show that DNNs for Chinese text classification are also vulnerable to adversarial examples. We propose a marginal attack method to generate adversarial examples that could fool the DNNs. This method adopts the Naïve Bayes principle to filter sensitive words and it only adds a small number of sensitive words at the end of the original text. The generated adversarial example could fool a variety of Chinese text classification DNNs, such that the text would be classified to incorrect category with high probability. We conduct extensive experiments to evaluate the attack performance and the results show that the success ratio of the attacks could reach almost 100% by adding only five sensitive words.
更多
查看译文
关键词
Marginal Attack,Adversarial Exmaple,Chinese text classification,Deep Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要