Spectral regularization for adversarially-robust representation learning
CoRR(2024)
摘要
The vulnerability of neural network classifiers to adversarial attacks is a
major obstacle to their deployment in safety-critical applications.
Regularization of network parameters during training can be used to improve
adversarial robustness and generalization performance. Usually, the network is
regularized end-to-end, with parameters at all layers affected by
regularization. However, in settings where learning representations is key,
such as self-supervised learning (SSL), layers after the feature representation
will be discarded when performing inference. For these models, regularizing up
to the feature space is more suitable. To this end, we propose a new spectral
regularizer for representation learning that encourages black-box adversarial
robustness in downstream classification tasks. In supervised classification
settings, we show empirically that this method is more effective in boosting
test accuracy and robustness than previously-proposed methods that regularize
all layers of the network. We then show that this method improves the
adversarial robustness of classifiers using representations learned with
self-supervised training or transferred from another classification task. In
all, our work begins to unveil how representational structure affects
adversarial robustness.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要