Experimental Evaluation of Adversarial Attacks Against Natural Language Machine Learning Models.

Jonathan Li, Steven Pugh,Honghe Zhou,Lin Deng,Josh Dehlinger, Suranjan Chakraborty

SERA(2023)

引用 0|浏览2
暂无评分
摘要
Machine learning models are being increasingly relied on for many natural language processing tasks. However, these models are vulnerable to adversarial attacks, i.e., inputs designed to target models into making a wrong prediction. Among different methods of attacking a model, it is important to understand what attacks are effective, so that we can design countermeasures to protect the models. In this paper, we design and implement six adversarial attacks against natural language machine learning models. Then, we evaluate the effectiveness of these attacks using a fine-tuned distilled BERT model and 5,000 sample sentences from the SST-2 dataset. Our results indicate that the Word-replace attack affected the model the most, which reduces the F1-score of the model by 34%. The Word-delete attack is the least effective, but still reduces the model’s accuracy by 17%. Based on the experimental results, we discuss our insights and provide our recommendations for building robust natural language machine learning models.
更多
查看译文
关键词
Adversarial Attack,Machine Learning,Deep Learning,Natural Language Processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要