Home

Awesome

Overfitting in adversarially robust deep learning

A repository which implements the experiments for exploring the phenomenon of robust overfitting, where robust performance on the test performance degradessignificantly over training. Created by Leslie Rice, Eric Wong, and Zico Kolter. See our paper on arXiv here.

News

Robust overfitting hurts - early stopping is essential!

A large amount of research over the past couple years has looked into defending deep networks against adversarial examples, with significant improvements over the well-known PGD-based adversarial training defense. However, adversarial training doesn't always behave similarly to standard training. The main observation we find is that, unlike in standard training, training to convergence can significantly harm robust generalization, and actually increases robust test error well before training has converged, as seen in the following learning curve:

overfitting

After the initial learning rate decay, the robust test error actually increases! As a result, training to convergence is bad for adversarial training, and oftentimes, simply training for one epoch after decaying the learning rate achieves the best robust error on the test set. This behavior is reflected across multiple datasets, different approaches to adversarial training, and both L-infinity and L-2 threat models.

No algorithmic improvements over PGD-based adversarial training

We can apply this knowledge to PGD-based adversarial training (e.g. as done by the original paper here), and find that early stopping can substantially improve the robust test error by 8%! As a result, we find that PGD-based adversarial training is as good as existing SOTA methods for adversarial robustness (e.g. on par with or slightly better than TRADES). On the flipside, we note that the results reported by TRADES also rely on early stopping, as training the TRADES approach to convergence results in a significant increase in robust test error. Unfortunately, this means that all of the algorithmic gains over PGD in adversarially robust training can be equivalent obtained with early stopping.

What is in this repository?

Model weights for the following models can be found in this drive folder: