Awesome

ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation

This repository contains the implementation code for paper ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation (ICML 2019).

ME-Net is a preprocessing-based defense method against adversarial examples, which is both model-agnostic and attack-agnostic. Being model-agnostic means ME-Net can easily be embedded into existing networks, and being attack-agnostic means ME-Net can improve adversarial robustness against a wide range of black-box and white-box attacks. Specifically, we focus on the intrinsic global structures (e.g., low-rank) within images, and leverage matrix estimation (ME) to exploit such underlying structures for better adversarial robustness.

overview

Dependencies

The current code has been tested on Ubuntu 16.04. You can install the dependencies using

pip install -r requirements.txt

Main Files

The code provided in this repository is able to do the following tasks:

train_pure.py: Train a ME-Net model with standard SGD.
train_adv.py: Train a ME-Net model with adversarial training. We mainly focus on PGD-based adversarial training under L_infinity perturbation bound.
attack_blackbox.py: Perform black-box attacks on trained ME-Net model. We provide three kinds of black-box attacks, including transfer-based attack (i.e., using FGSM, PGD and CW), decision-based attack (i.e., Boundary attack), and score-based attack (i.e., SPSA).
attack_whitebox.py: Perform white-box attacks on trained ME-Net model. We mainly focus on white-box adversarial robustness against L_infinity bounded PGD attack.

Note: The current release is for the CIFAR-10 dataset. We also test ME-Net on MNIST, SVHN, and Tiny-ImageNet dataset. The main code framework is the same for different datasets, while the only difference is the dataloader. We will update code for the remaining datasets soon.

Train ME-Net

Matrix estimation is a well studied topic with a number of established ME techniques. We mainly focus on three different ME methods throughout our study:

The universal singular value thresholding (USVT) approach
The Soft-Impute algorithm
The nuclear norm minimization algorithm

Note that one could either view the three RGB channels separately as independent matrices or jointly by concatenating them into one matrix. While the main paper follows the latter approach, we here provide an argument --me-channel to choose how you want to operate on the channels for ME. We provide comparison between the two methods later.

As ME-Net uses different masked realizations of each image during training, we use the following method to generate masks with different observing probability: for each image, we select --mask-num masks in total with observing probability ranging from --startp to --endp with equal intervals.

Common Arguments

The following arguments are used by scripts for training ME-Net, including train_pure.py, and train_adv.py:

Paths

--data-dir: directory path to read data.
--save-dir: directory path to store/load models.

Hyper-parameters

--model: choose which model to use (default: ResNet18).
--mu: the nuclear norm minimization algorithm hyper-parameter (default: 1).
--svdprob: the USVT approach hyper-parameter (default: 0.8).
--startp: the start probability of mask sampling.
--endp: the end probability of mask sampling.
--batch-size: the mini-batch size for training (default: 256).
--mask-num: the number of sampled masks (default: 10).

ME parameters

--me-channel: view RGB channels separately as independent matrices, or jointly by concatenating: separate | concat (default: concat).
--me-type: choose which method to use for matrix estimation: usvt | softimp | nucnorm (default: usvt).

Train ME-Net with Standard SGD

To train a pure ME-Net model with SGD, for example, using nucnorm with probability from 0.8 -> 1 with concat channels:

python train_pure.py --data-dir <path> \
    --save-dir <path> \
    --startp 0.8 \
    --endp 1 \
    --me-channel concat \
    --me-type nucnorm \
    <optional-arguments>

Train ME-Net with Adversarial Training

To adversarially train a ME-Net model, for example, using usvt with probability from 0.4 -> 0.6 with concat channels, under 7 steps PGD attacks:

python train_adv.py --data-dir <path> \
    --save-dir <path> \
    --startp 0.4 \
    --endp 0.6 \
    --me-channel concat \
    --me-type usvt \
    --attack \
    --iter 7 \
    <optional-arguments>

Pre-generated Datasets

Since the first step for training a pure ME-Net model is to generate a new dataset (--mask-num times larger), which can be time-consuming for certain ME methods. We provide several pre-generated datasets with different observing probabilities and different ME methods (will update soon):

An example to load such pre-generated datasets:

class CIFAR10_Dataset(Data.Dataset):

    def __init__(self, train=True, target_transform=None):
        self.target_transform = target_transform
        self.train = train

        # Loading training data
        if self.train:
            self.train_data, self.train_labels = get_data(train)
            self.train_data = np.load('/path/to/training/data/')
        # Loading testing data
        else:
            self.test_data, self.test_labels = get_data()
            self.test_data = np.load('/path/to/testing/data/')

Evaluate ME-Net

Black-box Attacks

To perform a black-box attack on a trained ME-Net model, for example, using spsa attack with 2048 samples:

python attack_blackbox.py --data-dir <path> \
    --ckpt-dir <path> \
    --name <saved-ckpt-name> \
    --attack-type spsa \
    --spsa-sample 2048 \
    <optional-arguments>

The following arguments are commonly used to perform black-box attacks:

--data-dir: directory path to read data.
--ckpt-dir: directory path to load saved model checkpoints.
--name: the name of saved checkpoints.
--maskp: the probability of mask sampling (note that for ME-Net inference we simply use the average of masking probabilities during training; one can also play with other choices such as a randomly sampled one).
--source: the source model of transfer-based black-box attacks.
--attack-type: fgsm | pgd | cw | spsa | boundary.
--epsilon: the upper bound change of L-inf norm on input pixels (default: 8).
--iter: the number of iterations for iterative attacks (default: 1000).
--cw-conf: the confidence of adversarial examples for CW attack (default: 20).
--spsa-sample: the number of SPSA samples for SPSA attack (default: 2048).

White-box Attacks

To perform a white-box attack on a trained ME-Net model, for example, using 1000 steps PGD-based BPDA attack:

python attack_whitebox.py --data-dir <path> \
    --ckpt-dir <path> \
    --name <saved-ckpt-name> \
    --attack \
    --mode pgd \
    --iter 1000 \
    <optional-arguments>

The following arguments are commonly used to perform white-box attacks:

--data-dir: directory path to read data.
--ckpt-dir: directory path to load saved model checkpoints.
--name: the name of saved checkpoints.
--maskp: the probability of mask sampling (note that for ME-Net inference we simply use the average of masking probabilities during training; one can also play with other choices such as a randomly sampled one).
--attack: perform adversarial attacks (default: True).
--epsilon: the upper bound change of L-inf norm on input pixels (default: 8).
--iter: the number of iterations for iterative attacks (default: 1000).
--mode: use toolbox or pgd implementation: toolbox | pgd (default: pgd). Note that we provide the attack implementation using Foolbox, however it can not achieve as high attack success rate as the PGD implementation.

Pre-trained Models

We provide several pre-trained ME-Net models (with both purely and adversarially trained ones) on CIFAR-10 with USVT method. Note that for different attacks, models trained with different p values can perform differently (more details can be found in our paper):

Since the saved model contains no information about the ME-Net preprocessing, one should wrap the loaded model with ME layer. An example to load pre-trained models:

# black-box attacks
model = checkpoint['model']
menet_model = MENet(model)
menet_model.eval()

# white-box attacks
net = AttackPGD(menet_model, config)
net.eval()

Representative Results

Visualization of how ME affects the input images

menet_results_0

Images are approximately low-rank

menet_results_1

Qualitative and quantitative results against black-box attacks

menet_results_2

Adversarial robustness under PGD-based BPDA white-box attacks

menet_results_3

Acknowledgements

We use the implemetation in the fancyimpute package for part of our matrix estimation algorithms. We use standard adversarial attack packages Foolbox and CleverHans for evaluating our defense.

Citation

If you find the idea or code useful for your research, please cite our paper:

@inproceedings{yang2019menet,
  title={{ME-Net}: Towards Effective Adversarial Robustness with Matrix Estimation},
  author={Yang, Yuzhe and Zhang, Guo and Katabi, Dina and Xu, Zhi},
  booktitle={Proceedings of the 36th International Conference on Machine Learning (ICML)},
  year={2019},
}