Home

Awesome

RobRank: Adversarial Robustness in Deep Ranking

badage GitHub license

Deep neural networks are vulnerable to adversarial attacks, and so does deep ranking or deep metric learning models. The project RobRank aims to study the empirical adversarial robustness of deep ranking / metric learning models. Our contribution includes (1) the definition and implementation of two new adversarial attacks, namely candidate attack and query attack; (2) two adversarial defense methods (based on adversarial training) are proposed to improve model robustness against a wide range of attacks; (3) a comprehensive empirical robustness score for quantitatively assessing adversarial robustness. In particular, an "Anti-Collapse Triplet" defense method is newly introduced in RobRank, which achieves at least 60% and at most 540% improvement in adversarial robustness compared to the ECCV work. See the preprint manuscript for details.

RobRank codebase is extended from my previous ECCV'2020 work "Adversarial Ranking Attack and Defense," with a major code refactor. You may find most functionalities of the previous codebase in this repository as well.

Note, the project name is RobRank, instead of RobBank.

Preprint-Title: "Adversarial Attack and Defense in Deep Ranking"
Preprint-Authors: Mo Zhou, Le Wang, Zhenxing Niu, Qilin Zhang, Nanning Zheng, Gang Hua
Preprint-Link: https://arxiv.org/abs/2106.03614
Keywords: Deep {Ranking, Metric Learning}, Adversarial {Attack, Defense, Robustness}

Project Status: Actively maintained.
Install-RobRank-Python-Dependency: $ pip install -r requirements.txt
Try-It-on-Colab: [fashion:rc2f2p:ptripletN] [cars:rres18p:ptripletN]

News and Updates

  1. [2024-02-03] This manuscript has been accepted to T-PAMI. https://ieeexplore.ieee.org/document/10433769
  2. [2022-03-02] New paper based on this code base has been published: Enhancing Adversarial Robustness for Deep Metric Learning, CVPR, 2022. Note, in this new paper, we further improved the benign performance, adversarial robustness, as well as training efficiency altogether for robust metric learning.

Tables for Robustness Comparison

In the following tables, "N/A" denotes "no defense equipped"; EST is the defense proposed in the ECCV'2020 paper; ACT is the new defense in the preprint paper. These rows are sorted by ERS. I'm willing to add other DML defenses for comparison in these tables.

DatasetModelLossDefenseR@1R@2mAPNMIERS
CUBRN18TripletN/A53.966.426.159.53.8
CUBRN18TripletEST8.513.02.625.25.3
CUBRN18TripletACT27.538.212.243.033.9
CUBRN18TripletHM34.945.019.847.136.0
DatasetModelLossDefenseR@1R@2mAPNMIERS
CARSRN18TripletN/A62.574.023.857.03.6
CARSRN18TripletEST30.741.05.631.87.3
CARSRN18TripletACT43.456.511.842.938.6
CARSRN18TripletHM60.271.633.951.246.0
DatasetModelLossDefenseR@1R@2mAPNMIERS
SOPRN18TripletN/A62.968.539.287.44.0
SOPRN18TripletEST46.051.424.584.731.7
SOPRN18TripletACT47.552.625.584.950.8
SOPRN18TripletHM46.851.724.584.761.6

Source of these defense methods:

  1. N/A: Just standard classification network.
  2. EST: Adversarial Ranking Attack and Defense (ECCV2020)
  3. ACT: Adversarial Attack and Defense in Deep Ranking (arXiv:2106.03614)
  4. HM (or, concreately, ghmetsmi): Enhancing Adversarial Robustness for Deep Metric Learning (CVPR2022)

Datasets like MNIST and Fashion-MNIST are excluded here because they are simple toy datasets mostly for sanity testing, not for practical use.

1. Common Usage of CLI

Python library RobRank provides these functionalities: (1) training classification or ranking (deep metric learning) models, either vanilla or defensive; (2) perform adversarial attack against the trained models; (3) perform batched adversarial attack. See below for detailed usage.

You can always specify the GPUs to use by export CUDA_VISIBLE_DEVICES=<GPUs>.

Environment Setup: Use the command $ pip install -r requirements.txt to install all required python dependencies. Then you can use pytest -v -x to run the testsuite in order to make sure the code runs correctly. In case of pytest failure, you are welcome to open a new issue for this code repository.

1.1. Training

Training deep metric learning model or classification model, either normally or adversarially. As pytorch-lightning is used by this project, the training process will automatically use DistributedDataParallel when more than one GPU are available.

The typical usage for training a model is as follows

python3 bin/train.py -C <dataset>:<model>:<loss>

where a "config" is composed of three components, so that such mechanism is flexible enough to express many combinations. Specifically:

For example:

# classification
python3 bin/train.py -C mnist:cc2f2:ce --do_test
python3 bin/train.py -C cifar10:cres18:ce   # cifar10, resnet 18 classify, CE loss
python3 bin/train.py -C cifar10:cres50:ce   # cifar10, resnet 50 classify, CE loss
# deep metric learning
python3 bin/train.py -C mnist:rc2f2:ptripletN
python3 bin/train.py -C mnist:rc2f2p:ptripletN
python3 bin/train.py -C cub:rres18:ptripletN
python3 bin/train.py -C cub:rres18p:ptripletN
python3 bin/train.py -C cars:rres18:ptripletN
python3 bin/train.py -C cars:rres18p:ptripletN
python3 bin/train.py -C sop:rres18:ptripletN
python3 bin/train.py -C sop:rres18p:ptripletN

Tips:

  1. When training DML models, export FAISS_CPU=1 to disable NMI score calculation on GPU (faiss). This could save a little bit of video memory of you encounter CUDA OOM.
  2. To change the number of PGD iterations for creating adversarial examples during the training process, create an empty file to indicate the change. For example, touch override_pgditer_8. See robrank/configs/configs_rank.py for detail.

1.2. Adversarial Attack

Script bin/advrank.py is the entrance for conducting adversarial attacks against a trained model. For example, to conduct CA (w=1) with several manually specified PGD parameters, you can do it as follows:

python3 bin/advrank.py -v -A CA:pm=+:W=1:eps=0.30196:alpha=0.011764:pgditer=32 -C <xxx.ckpt>

where xxx.ckpt is the path to the trained model (saved as a pytorch-lightning checkpoint). The arguments specific to adversarial attacks are joined with a colon ":" in order to avoid lengthy python code based argparse module. Example:

python3 bin/advrank.py -v -A CA:pm=+:W=1:eps=0.30196:alpha=0.011764:pgditer=32 -C logs_cub-rres18p-ptripletN/lightning_logs/version_0/checkpoints/epoch=74-step=3974.ckpt

Please browse the bash scripts under the tools/ directory for examples of other types of attacks discussed in the paper. Example:

export CKPT=logs_cub-rres18p-ptripletN/lightning_logs/version_0/checkpoints/epoch=74-step=3974.ckpt
bash tools/ca.bash + $CKPT      # CA+ column
bash tools/ca.bash - $CKPT      # CA- column
bash tools/es.bash $CKPT        # ES:D and ES:R column

1.3. Batched Adversarial Attack

Script bin/swipe.py is used for conducting a batch of attacks against a specified model (pytorch-lightning checkpoint), automatically. And it will save the output in json format as <model_ckpt>.ckpt.<swipe_profile>.json. Available swipe_profile includes rob28, rob224 for ERS score; and pami28, pami224 for CA and QA in various settings. A full list of possible profiles can be found in robrank/cmdline.py. You can even customize the code and create your own profile for batched evaluation.

python3 bin/swipe.py -p rob28 -C logs_fashion-rc2f2-ptripletN/.../xxx.ckpt
python3 bin/swipe.py -p rob224 -C logs_cub-rres18-ptripletN/.../xxx.ckpt

You may use -m <number> (e.g. -m 10) specify the max number of iterations to get a quick accessment instead of going through the whole validation dataset.

Currently only single-GPU mode is supported for attacks. When the batched attack is finished, the results will be written into a json file logs_fashion-rc2f2-ptripletN/.../xxx.ckpt.json. A helper script tools/pjswipe.py can display the content of resulting json files and calculate the corresponding ERS:

$ python3 tools/pjswipe.py logs_fashion-rc2f2-ptripletN

The script will automatically use the json file corresponding to the latest version of the specified config. So specifying the log directory is enough. That said, if multiple versions of the same config exists, and you want to let it print result of an old version, export ITH=<version> (e.g. ITH=1) and run again. If tested with multiple profiles, export JTYPE to select exact profile. Read the comments in tools/pjswipe.py for details.

1.4 Scripts for Complete Pipeline

Please browse the escript directory for the scripts containing the command pipelines to reproduce the experiments.

2. Project Information

2.1. Directory Hierarchy

(the following directory tree is manually edited and annotated)
.
├── requirements.txt              Python deps (`pip install -r ...txt`)
├── bin/train.py                  Entrance script for training models.
├── bin/advrank.py                Entrance script for adversarial ranking.
├── bin/swipe.py                  Entrance script for batched attack.
├── robrank                       RobRank library.
│   ├── attacks                   Attack Implementations.
│   │   └── advrank*.py           Adversarial ranking attack (ECCV'2020).
│   ├── defenses/*                Defense Implementations.
│   ├── configs/*                 Configurations (incl. hyper-parameters).
│   ├── datasets/*                Dataset classes.
│   ├── models                    Models and base classes.
│   │   ├── template_classify.py  Base class for classification models.
│   │   ├── template_hybrid.py    Base class for Classification+DML models.
│   │   └── template_rank.py      Base class for DML/ranking models.
│   ├── losses/*                  Deep metric learning loss functions.
│   ├── cmdline.py                Command line interface implementation.
│   └── utils.py                  Miscellaneous utilities.
└── tools/*                       Miscellaneous tools for experiments.

2.2. Tested Platform

Tested Software versions:

OS: Debian unstable, Debian Bullseye, Ubuntu 20.04 LTS, Ubuntu 16.04 LTS
Python (anaconda distribution): 3.8.5, 3.9.X
PyTorch: 1.7.1, 1.8.1, 1.11.0
PyTorch-Lightning: see requirements.txt

Mainly Tested Hardware:

CPU: Intel Xeon Family
GPU: Nvidia GTX1080Ti, Titan Xp, RTX3090, A5000, A6000, A100

With 8 RTX3090 GPUs, most experiments can be finished within 1 day. With older configurations (such as 4* GTX1080Ti), most experiments can be finished within 3 days, including adversarial training.

Memory requirement: 12GB video memory is required for adversarial training of RN18, Mnas, and IBN. Additionally, adversarial training of RN50 requires 24GB.

If you encounter the following error message:

Traceback (most recent call last):
  File "bin/train.py", line 16, in <module>
    import robrank as rr
ModuleNotFoundError: No module named 'robrank'

Just try export PYTHONPATH=. and run your command again.

2.3. Dataset Preparation

The default data path setting for any dataset can be found in robrank/configs/configs_dataset.py.

MNIST and Fashion-MNIST are downloaded using torchvision. The helper script bin/download.py can download and extract the two datasets for you. Just do as follows in your terminal from the root directory of this project:

$ export PYTHONPATH=.
$ pyhton3 bin/download.py

Then the MNIST and Fashion-MNIST datasets are ready to use. Try to train a model.

The rest datasets, namely CUB-200-2011, Cars-196, and Stanford Online Products can be downloaded from their correspoding websites (and then manually extracted).

CUB: The tarball can be downloaded from http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz. Then change your working directory to ~/.torch and tar xvf <path>/CUB_200_2011.tgz -C .. Now we are all set.

CARS: Create a directory ~/.torch/cars then change working directory into it. Download http://imagenet.stanford.edu/internal/car196/car_ims.tgz and http://imagenet.stanford.edu/internal/car196/cars_annos.mat into the directory. In the end extract the tarball tar xvf car_ims.tgz. We are ready to go.

SOP: After you downloaded Stanford_Online_Products.zip from ftp://cs.stanford.edu/cs/cvgl/Stanford_Online_Products.zip, just do $ cd ~/.torch and $ unzip <path>/Stanford_Online_Products.zip. Now SOP is ready to use.

The dataset loader is able to smartly read the dataset from /dev/shm to overcome IO bottleneck (especially from HDDs) if a copy of dataset if available there. For instance, rsync -av ~/.torch/Stanford_Online_Products /dev/shm.

CIFAR: For cifar10 cd ~/.torch/; wget -c https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz; tar xvf cifar-10-python.tar.gz. And for cifar100 cd ~/.torch/; wget -c https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz; tar xvf cifar-100-python.tar.gz.

2.4. References and Bibtex

If you found the paper/code useful/inspiring, please consider citing my work:

@misc{robrank,
      title={Adversarial Attack and Defense in Deep Ranking}, 
      author={Mo Zhou and Le Wang and Zhenxing Niu and Qilin Zhang and Nanning Zheng and Gang Hua},
      year={2021},
      eprint={2106.03614},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Bibtex of M. Zhou, et al. "Adversarial Ranking Attack and Defense," ECCV'2020. can be found in the linked page.

Reference Software Projects:

  1. https://github.com/Confusezius/Deep-Metric-Learning-Baselines
  2. https://github.com/Confusezius/Revisiting_Deep_Metric_Learning_PyTorch
  3. https://github.com/idstcv/SoftTriple
  4. https://github.com/KevinMusgrave/pytorch-metric-learning
  5. https://github.com/RobustBench/robustbench
  6. https://github.com/fra31/auto-attack
  7. https://github.com/KevinMusgrave/powerful-benchmarker
  8. https://github.com/MadryLab/robustness

Frequently Asked Questions

A: As you may have find it ... there are lots of leftover attemps towards a better defense in robrank/defenses. And renames during research process also results in some inconsistency. So I'd better directly point out the code position here:
(1) hm_training_step in defenses/amd.py is the Hardness Manipulation (HM) defense. The function for creating adversarial examples for adversarial training is MadryInnerMax.HardnessManipulate in the same file.
(2) pnp_training_step in defenses/pnp.py is the Anti-Collapse Triplet (ACT) defense. The function for creating adversarial examples for adversarial training is PositiveNegativePerplexing.pncollapse in the same file.
(3) est_training_step in defenses/est.py is the Embedding-Shift Triplet (EST) defense. The function for creating adversarial examples for adversarial training is the ES attack from the AdvRank class.

A: I hate Nvidia for such weird issue. And the reason of distributed data parallel being stuck varies across different situations or machines. Here are a bunch of tricks that might or might not work:
(1) Comment out th.distributed.barrier() from the code and run again. You can locate that barrier function in the code using ripgrep. This seemed effective on RTX3090;
(2) use rank_zero_only option for pytorch-lightning logger: sed -i robrank/models/template_rank.py -e "s/self.log(\(.*\))/self.log(\1, rank_zero_only=True)/g";
(3) change the distributed backend of pytorch: export PL_TORCH_DISTRIBUTED_BACKEND=gloo;
(4) disable P2P feature for NCCL. export NCCL_P2P_DISABLE=1;
(5) change strategy from ddp to ddp_spawn in robrank/cmdline.py. Run the training again and let it raise error. Then change back to ddp and the A5000 started working;
(6) P2P GPU traffic will fail with IOMMU. Check the p2pBandwithLatencyTest cuda example and see whether it could run. If not, then it's not a pytorch issue. Disable iommu from kernel parameter should work. GRUB_CMDLINE_LINUX="iommu=soft" in /etc/default/grub. Run sudo update-grub2 after edit. Linux kernel has a documentation describing this iommu parameter. IOMMU group assignment can be found under /sys/kernel/iommu_group;
(7) Use only even/odd numbered GPUs CUDA_VISIBLE_DEVICES=1,3,5 instead of CUDA_VISIBLE_DEVICES=1,2,3. This works sometimes for at least the p2pBandwithLatencyTest test program;
(8) turn off ACS in BIOS;
(9) change num_workers=0 for dataloader.

A: They are equivalent due to the implementation details in the dataset sampler. It is a fixable problem (but not necessary). See issue #9.

RTX A5000 performance is similar to RTX 3090. RTX A6000 is slightly faster than RTX 3090. Nvidia A100 is roughly 1.5 times faster than RTX 3090. RTX 3090 is roughly 2~3 times faster than Nvidia Titan Xp (or GTX 1080Ti). In the following table, eta is exactly PGD iteration number (pgditer). It can be overriden by file indicators like override_pgditer_8 as described in previous documentation. Time cost on MNIST and Fashion-MNIST is expected to be identical. For the rest datasets, time consumption order is CUB < CARS < SOP.

ConfigetaGPU ModelNumber of GPUsTime (roughly)
fashion:rc2f2:ptripletNN/ATitan Xp2 (DDP)2 min
fashion:rc2f2p:ptripletN32Titan Xp2 (DDP)10 min
cub:rres18:ptripletNN/ATitan Xp2 (DDP)30 min
cub:rres18p:ptripletN8Titan Xp2 (DDP)130 min
cub:rres18p:ptripletN32Titan Xp2 (DDP)420 min
cub:rres18ghmetsmi:ptripletN32Titan Xp2 (DDP)470 min
cars:rres18p:ptripletN8Titan Xp2 (DDP)180 min
cars:rres18ghmetsmi:ptripletN32Titan Xp2 (DDP)530 min
sop:rres18:ptripletNN/ARTX A50004 (DDP)60 min
sop:rres18:ptripletNN/ARTX A60002 (DDP)120 min
sop:rres18p:ptripletN8RTX A60002 (DDP)560 min
sop:rres18p:ptripletN32RTX A60002 (DDP)1830 min

See the model card for download links.

Copyright and License

Copyright (C) 2019-2022, Mo Zhou <cdluminate@gmail.com>

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.