Home

Awesome

Adversarial Distributional Training

This repository contains the code for adversarial distributional training (ADT) introduced in the following paper

Adversarial Distributional Training for Robust Deep Learning (NeurIPS 2020)

Yinpeng Dong*, Zhijie Deng*, Tianyu Pang, Hang Su, and Jun Zhu (* indicates equal contribution)

Citation

If you find our methods useful, please consider citing:

@inproceedings{dong2020adversarial,
  title={Adversarial Distributional Training for Robust Deep Learning},
  author={Dong, Yinpeng and Deng, Zhijie and Pang, Tianyu and Su, Hang and Zhu, Jun},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

Introduction

Adversarial distribution training (ADT) is a new framework to train robust deep learning models. It is formulated as a minimax optimization problem, in which the inner maximization aims to find an adversarial distribution for each natural input to characterize potential adversarial examples; and the outer minimization aims to optimize DNN parameters with the worst-case adversarial distributions.

In this paper, we proposed three different approaches to parameterize the adversarial distributions, as illustrated below.

<img src="algos_adt.jpg">

Figure 1: An illustration of three different ADT methods, including (a) ADT<sub>EXP</sub>; (b) ADT<sub>EXP-AM</sub>; (c) ADT<sub>IMP-AM</sub>.

Prerequisites

Training

We have proposed three different methods for ADT. The command for each training method is specified below.

Training ADT<sub>EXP</sub>

python adt_exp.py --model-dir adt-exp --dataset cifar10 (or cifar100/svhn)

Training ADT<sub>EXP-AM</sub>

python adt_expam.py --model-dir adt-expam --dataset cifar10 (or cifar100/svhn)

Training ADT<sub>IMP-AM</sub>

python adt_impam.py --model-dir adt-impam --dataset cifar10 (or cifar100/svhn)

The checkpoints will be saved at each model folder.

Evaluation

Evaluation under White-box Attacks

python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method FGSM --dataset cifar10 (or cifar100/svhn)
python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method PGD --num-steps 20 (or 100) --dataset cifar10 (or cifar100/svhn)
python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method MIM --num-steps 20 --dataset cifar10 (or cifar100/svhn)
python evaluate_attacks.py --model-path ${MODEL-PATH} --attack-method CW --num-steps 30 --dataset cifar10 (or cifar100/svhn)
python feature_attack.py --model-path ${MODEL-PATH} --dataset cifar10 (or cifar100/svhn)

Evaluation under Transfer-based Black-box Attacks

First change the --white-box-attack argument in evaluate_attacks.py to False. Then run

python evaluate_attacks.py --source-model-path ${SOURCE-MODEL-PATH} --target-model-path ${TARGET-MODEL-PATH} --attack-method PGD (or MIM)

Evaluation under SPSA

python spsa.py --model-path ${MODEL-PATH} --samples_per_draw 256 (or 512/1024/2048)

Pretrained Models

We have provided the pre-trained models on CIFAR-10, whose performance is reported in Table 1. They can be downloaded at

Contact

Yinpeng Dong: dyp17@mails.tsinghua.edu.cn

Zhijie Deng: dzj17@mails.tsinghua.edu.cn