Understanding frank-wolfe adversarial training

semanticscholar(2021)

引用 1|浏览0
暂无评分
摘要
Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense against such attacks. We develop a theoretical framework for adversarial training with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the distortion of `∞ FW attacks (the attack’s `2 norm). Specifically, we show that high distortion of FW attacks is equivalent to low variation along the attack path. It is then experimentally demonstrated on various deep neural network architectures that `∞ attacks against robust models achieve near maximal `2 distortion. This mathematical transparency differentiates FW from the more popular Projected Gradient Descent (PGD) optimization. To demonstrate the utility of our theoretical framework we develop FW-Adapt, a novel adversarial training algorithm which uses simple distortion measure to adaptively change number of attack steps during training. FW-Adapt provides strong robustness at lower training times in comparison to PGD-AT for a variety of white-box and black-box attacks.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要