Robustifying Language Models with Test-Time Adaptation
CoRR(2023)
摘要
Large-scale language models achieved state-of-the-art performance over a
number of language tasks. However, they fail on adversarial language examples,
which are sentences optimized to fool the language models but with similar
semantic meanings for humans. While prior work focuses on making the language
model robust at training time, retraining for robustness is often unrealistic
for large-scale foundation models. Instead, we propose to make the language
models robust at test time. By dynamically adapting the input sentence with
predictions from masked words, we show that we can reverse many language
adversarial attacks. Since our approach does not require any training, it works
for novel tasks at test time and can adapt to novel adversarial corruptions.
Visualizations and empirical results on two popular sentence classification
datasets demonstrate that our method can repair adversarial language attacks
over 65% o
更多查看译文
关键词
language models,adaptation,test-time
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要