Home

Awesome

ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation

This repository contains the implementation code for paper ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation (ICML 2019).

ME-Net is a preprocessing-based defense method against adversarial examples, which is both model-agnostic and attack-agnostic. Being model-agnostic means ME-Net can easily be embedded into existing networks, and being attack-agnostic means ME-Net can improve adversarial robustness against a wide range of black-box and white-box attacks. Specifically, we focus on the intrinsic global structures (e.g., low-rank) within images, and leverage matrix estimation (ME) to exploit such underlying structures for better adversarial robustness.

overview

Dependencies

The current code has been tested on Ubuntu 16.04. You can install the dependencies using

pip install -r requirements.txt

Main Files

The code provided in this repository is able to do the following tasks:

Note: The current release is for the CIFAR-10 dataset. We also test ME-Net on MNIST, SVHN, and Tiny-ImageNet dataset. The main code framework is the same for different datasets, while the only difference is the dataloader. We will update code for the remaining datasets soon.

Train ME-Net

Matrix estimation is a well studied topic with a number of established ME techniques. We mainly focus on three different ME methods throughout our study:

Note that one could either view the three RGB channels separately as independent matrices or jointly by concatenating them into one matrix. While the main paper follows the latter approach, we here provide an argument --me-channel to choose how you want to operate on the channels for ME. We provide comparison between the two methods later.

As ME-Net uses different masked realizations of each image during training, we use the following method to generate masks with different observing probability: for each image, we select --mask-num masks in total with observing probability ranging from --startp to --endp with equal intervals.

Common Arguments

The following arguments are used by scripts for training ME-Net, including train_pure.py, and train_adv.py:

Paths

Hyper-parameters

ME parameters

Train ME-Net with Standard SGD

To train a pure ME-Net model with SGD, for example, using nucnorm with probability from 0.8 -> 1 with concat channels:

python train_pure.py --data-dir <path> \
    --save-dir <path> \
    --startp 0.8 \
    --endp 1 \
    --me-channel concat \
    --me-type nucnorm \
    <optional-arguments>

Train ME-Net with Adversarial Training

To adversarially train a ME-Net model, for example, using usvt with probability from 0.4 -> 0.6 with concat channels, under 7 steps PGD attacks:

python train_adv.py --data-dir <path> \
    --save-dir <path> \
    --startp 0.4 \
    --endp 0.6 \
    --me-channel concat \
    --me-type usvt \
    --attack \
    --iter 7 \
    <optional-arguments>

Pre-generated Datasets

Since the first step for training a pure ME-Net model is to generate a new dataset (--mask-num times larger), which can be time-consuming for certain ME methods. We provide several pre-generated datasets with different observing probabilities and different ME methods (will update soon):

An example to load such pre-generated datasets:

class CIFAR10_Dataset(Data.Dataset):

    def __init__(self, train=True, target_transform=None):
        self.target_transform = target_transform
        self.train = train

        # Loading training data
        if self.train:
            self.train_data, self.train_labels = get_data(train)
            self.train_data = np.load('/path/to/training/data/')
        # Loading testing data
        else:
            self.test_data, self.test_labels = get_data()
            self.test_data = np.load('/path/to/testing/data/')

Evaluate ME-Net

Black-box Attacks

To perform a black-box attack on a trained ME-Net model, for example, using spsa attack with 2048 samples:

python attack_blackbox.py --data-dir <path> \
    --ckpt-dir <path> \
    --name <saved-ckpt-name> \
    --attack-type spsa \
    --spsa-sample 2048 \
    <optional-arguments>

The following arguments are commonly used to perform black-box attacks:

White-box Attacks

To perform a white-box attack on a trained ME-Net model, for example, using 1000 steps PGD-based BPDA attack:

python attack_whitebox.py --data-dir <path> \
    --ckpt-dir <path> \
    --name <saved-ckpt-name> \
    --attack \
    --mode pgd \
    --iter 1000 \
    <optional-arguments>

The following arguments are commonly used to perform white-box attacks:

Pre-trained Models

We provide several pre-trained ME-Net models (with both purely and adversarially trained ones) on CIFAR-10 with USVT method. Note that for different attacks, models trained with different p values can perform differently (more details can be found in our paper):

Since the saved model contains no information about the ME-Net preprocessing, one should wrap the loaded model with ME layer. An example to load pre-trained models:

# black-box attacks
model = checkpoint['model']
menet_model = MENet(model)
menet_model.eval()

# white-box attacks
net = AttackPGD(menet_model, config)
net.eval()

Representative Results

Visualization of how ME affects the input images

menet_results_0

Images are approximately low-rank

menet_results_1

Qualitative and quantitative results against black-box attacks

menet_results_2

Adversarial robustness under PGD-based BPDA white-box attacks

menet_results_3

Acknowledgements

We use the implemetation in the fancyimpute package for part of our matrix estimation algorithms. We use standard adversarial attack packages Foolbox and CleverHans for evaluating our defense.

Citation

If you find the idea or code useful for your research, please cite our paper:

@inproceedings{yang2019menet,
  title={{ME-Net}: Towards Effective Adversarial Robustness with Matrix Estimation},
  author={Yang, Yuzhe and Zhang, Guo and Katabi, Dina and Xu, Zhi},
  booktitle={Proceedings of the 36th International Conference on Machine Learning (ICML)},
  year={2019},
}