Home

Awesome

Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective [arXiv]

Overview

<img align="right" width="600" src="assets/overview.png"> This work presents a holistic study of the impact of architectural choice on adversarial robustness.

(Left) Impact of architectural components on adversarial robustness on CIFAR-10, relative to that of adversarial training methods. (Right) Progress of SotA robust accuracy against AutoAttack without additional data on CIFAR-10 with $\ell_{\infty}$ perturbations of $\epsilon=8/255$ chronologically.

Impact of Block-level Design

The design of a block primarily comprises its topology, type of convolution and kernel size, choice of activation, and normalization. We examine these elements independently through controlled experiments and propose a novel residual block, dubbed RobustResBlock, based on our observations. An overview of RobustResBlock is provided below:

<figure> <img src="assets/rrblock.png" alt="rrblock" style="width:75%"> </figure>

Table 1. White-box adversarial robustness of WRN with RobustResBlock

$^{\#}\rm{P}$$^{\#}\rm{F}$$\rm{PGD}^{20}$$\rm{CW}^{40}$
$D=4$, $W=10$39.6M6.00G57.7054.71[BaiduDisk]
$D=5$, $W=12$70.5M10.6G58.4655.56[BaiduDisk]
$D=7$, $W=14$133M19.6G59.4156.62[BaiduDisk]
$D=11$, $W=16$270M39.3G60.4857.78[BaiduDisk]

Impact of Network-level Design

Independent Scaling by Depth ( $D_{1}$ : $D_2$ : $D_3$ = $2$ : $2$ : $1$ )

We allow the depth of each stage ( $D_{i\in\{1,2,3\}}$ ) to vary among $\{2, 3, 4, 5, 7, 9, 11\}$, details and pre-trained checkpoints of $7^{3} = 343$ depth settings are available from here.

<figure> <img src="assets/scale_depth.png" alt="scale_depth" style="width:100%"> </figure>

Independent Scaling by Width ( $W_{1}$ : $W_2$ : $W_3$ = $2$ : $2.5$ : $1$ )

We allow the width (in terms of widening factors) of each stage ( $W_{i\in\{1,2,3\}}$ ) to vary among $\{4, 6, 8, 10, 12, 14, 16, 20\}$, details and pre-trained checkpoints of $8^{3} = 512$ width settings are available from here.

<figure> <img src="assets/scale_width.png" alt="scale_width" style="width:100%"> </figure>

Interplay between Depth and Width ( $\sum D_{i}$ : $\sum W_{i}$ = $7$ : $3$ )

<figure> <img src="assets/compound_scale.png" alt="compound_scale" style="width:100%"> </figure> <figure> <img src="assets/compare_scale.png" alt="compare_scale" style="width:100%"> </figure>

Table 2. Performance of independent scaling ( $D$ or $W$ ) and compound scaling ( $D\&W$ )

$^{\#}\rm{F}$ TargetScale by$D_{1}$$W_{1}$$D_{2}$$W_{2}$$D_{3}$$W_{3}$$^{\#}\rm{P}$$^{\#}\rm{F}$$\rm{PGD}^{20}$$\rm{CW}^{40}$
$D$51051021024.0M5.25G56.0553.14[BaiduDisk]
5G$W$4114134624.5M5.71G56.8953.87[BaiduDisk]
$D\&W$1451477317.7M5.09G57.4954.78[BaiduDisk]
$D$61261231248.5M9.59G56.4253.91[BaiduDisk]
10G$W$5135165744.4M10.5G57.0654.29[BaiduDisk]
$D\&W$1771798439.3M9.74G58.0655.45[BaiduDisk]
$D$91481441490.4M18.6G57.1154.48[BaiduDisk]
20G$W$7167187881.7M20.4G58.0255.34[BaiduDisk]
$D\&W$228221111574.8M20.3G58.4756.14[BaiduDisk]
$D$141613161116185M38.8G57.9055.79[BaiduDisk]
40G$W$11181121119170M42.7G58.4856.15[BaiduDisk]
$D\&W$27102814136147M40.4G58.7656.59[BaiduDisk]

Adversarially Robust Residual Networks (RobustResNets)

We use the proposed compound scaling rule to scale RobustResBlock and present a portfolio of adversarially robust residual networks.

Table 3. Comparison to SotA methods with additional 500K data

MethodModel$^{\#}\rm{P}$$^{\#}\rm{F}$$\rm{AA}$
RSTWRN-28-1036.5M5.20G59.53
AWPWRN-28-1036.5M5.20G60.04
HATWRN-28-1036.5M5.20G62.50
Gowal et al.WRN-28-1036.5M5.20G62.80
Huang el al.WRN-34-R68.1M19.1G62.54
OursRobustResNet-A119.2M5.11G63.70[BaiduDisk]
OursWRN-A4147M40.4G65.79[BaiduDisk]

How to use

1. Use our RobustResNets

  from models.resnet import PreActResNet
  depth = [D1, D2, D3]
  channels = [16, 16*W1, 32*W2, 64*W3]
  block_types = ['robust_res_block', 'robust_res_block', 'robust_res_block']
  
  # Syntax
  model = PreActResNet(
    depth_configs=depth,
    channel_configs=channels,
    block_types=block_types,
    scales=8,
    base_width=10,
    cardinality=4,
    se_reduction=64
    num_classes=10,  # for CIFAR-10/SVHN/MNIST)
  
  # See Table 2 "D&W" rows for D1, D2, D3 and W1, W2, W3, see below for examples
  RobustResNet-A1 = PreActResNet(
    depth_configs=[14, 14, 7],
    channel_configs=[5, 7, 3],
    ...)
  RobustResNet-A2 = PreActResNet(
    depth_configs=[17, 17, 8],
    channel_configs=[7, 9, 4],
    ...)
  RobustResNet-A3 = PreActResNet(
    depth_configs=[22, 22, 11],
    channel_configs=[8, 11, 5],
    ...)
  RobustResNet-A4 = PreActResNet(
    depth_configs=[27, 28, 13],
    channel_configs=[10, 14, 6],
    ...)
  
  # If you prefer to use WRN's block but with our scalings
  WRN-A1 = PreActResNet(
    depth_configs=[14, 14, 7],
    channel_configs=[5, 7, 3],
    block_types = ['basic_block', 'basic_block', 'basic_block']
    ...)

2. Just want to use our block RobustResBlock

  from models.resnet import RobustResBlock
  # See Table 1 above for the performance of RobustResBlock
  block = RobustResBlock(
    in_chs, out_chs,
    kernel_size=3, 
    scales=8, 
    base_width=10, 
    cardinality=4,
    se_reduction=64,
    activation='ReLU', 
    normalization='BatchNorm')

3. Use our compound scaling rule, RobustScaling, to scale your custom models

Please see examples/compound_scaling.ipynb

How to evaluate pre-trained models

  python eval_robustness.py \
    --data "path to data" \
    --config_file_path "path to configuration yaml file" \
    --checkpoint_path "path to checkpoint pth file" \
    --save_path "path to file for logging evaluation" \
    --attack_choice [FGSM/PGD/CW/AA] \
    --num_steps [1/20/40/0] \
    --batch_size 100  # batch size for evaluation, adjust according to your GPU memory

CIFAR-10 (TRADES)

Model$^{\#}\rm{P}$$^{\#}\rm{F}$Clean$\rm{PGD}^{20}$$\rm{CW}^{40}$AA
WRN-28-1036.5M5.20G84.6255.9053.1551.66[BaiduDisk]
RobNet-large-v233.3M5.10G84.5752.7948.9447.48[BaiduDisk]
AdvRush32.6M4.97G84.9556.9953.2752.90[BaiduDisk]
RACL32.5M4.93G83.9155.9853.2251.37[BaiduDisk]
RRN-A1 (ours)19.2M5.11G85.4658.4755.7254.42[BaiduDisk]
WRN-34-1266.5M9.60G84.9356.0153.5351.97[BaiduDisk]
WRN-34-R68.1M19.1G85.8057.3554.7753.23[BaiduDisk]
RRN-A2 (ours)39.0M10.8G85.8059.7256.7455.49[BaiduDisk]
WRN-46-14128M18.6G85.2256.3754.1952.63[BaiduDisk]
RRN-A3 (ours)75.9M19.9G86.7960.1057.2955.84[BaiduDisk]
WRN-70-16267M38.8G85.5156.7854.5252.80[BaiduDisk]
RRN-A4 (ours)147M39.4G87.1060.2657.9056.29[BaiduDisk]

CIFAR-100 (TRADES)

Model$^{\#}\rm{P}$$^{\#}\rm{F}$Clean$\rm{PGD}^{20}$$\rm{CW}^{40}$AA
WRN-28-1036.5M5.20G56.3029.9126.2225.26[BaiduDisk]
RobNet-large-v233.3M5.10G55.2729.2324.6323.69[BaiduDisk]
AdvRush32.6M4.97G56.4030.4026.1625.27[BaiduDisk]
RACL32.5M4.93G56.0930.3826.6525.65[BaiduDisk]
RRN-A1 (ours)19.2M5.11G59.3432.7027.7626.75[BaiduDisk]
WRN-34-1266.5M9.60G56.0829.8726.5125.47[BaiduDisk]
WRN-34-R68.1M19.1G58.7831.1727.3326.31[BaiduDisk]
RRN-A2 (ours)39.0M10.8G59.3833.0028.7127.68[BaiduDisk]
WRN-46-14128M18.6G56.7830.0327.2726.28[BaiduDisk]
RRN-A3 (ours)75.9M19.9G60.1633.5929.5828.48[BaiduDisk]
WRN-70-16267M38.8G56.9329.7627.2026.12[BaiduDisk]
RRN-A4 (ours)147M39.4G61.6634.2530.0429.00[BaiduDisk]

CIFAR-10 (SAT)

Model$^{\#}\rm{P}$$^{\#}\rm{F}$$\rm{PGD}^{20}$$\rm{CW}^{40}$
WRN-28-1036.5M5.20G52.4450.97[BaiduDisk]
RRN-A1 (ours)19.2M5.11G57.6256.06[BaiduDisk]
WRN-34-1266.5M9.60G52.8551.36[BaiduDisk]
RRN-A2 (ours)39.0M10.8G58.3956.99[BaiduDisk]
WRN-46-14128M18.6G53.6752.95[BaiduDisk]
RRN-A3 (ours)75.9M19.9G58.8157.60[BaiduDisk]
WRN-70-16267M38.8G54.1250.52[BaiduDisk]
RRN-A4 (ours)147M39.4G59.0157.85[BaiduDisk]

CIFAR-10 (MART)

Model$^{\#}\rm{P}$$^{\#}\rm{F}$$\rm{PGD}^{20}$$\rm{CW}^{40}$
WRN-28-1036.5M5.20G57.6952.88[BaiduDisk]
RRN-A1 (ours)19.2M5.11G59.3454.42[BaiduDisk]
WRN-34-1266.5M9.60G57.4053.11[BaiduDisk]
RRN-A2 (ours)39.0M10.8G60.3355.51[BaiduDisk]
WRN-46-14128M18.6G58.4354.32[BaiduDisk]
RRN-A3 (ours)75.9M19.9G60.9556.52[BaiduDisk]
WRN-70-16267M38.8G58.1554.37[BaiduDisk]
RRN-A4 (ours)147M39.4G61.8857.55[BaiduDisk]

How to train

Baseline adversarial training

python -m torch.distributed.launch \
  --nproc_per_node=2 --master_port 24220 \  # use a random port number
  main_dist.py \
  --config_path ./configs/CIFAR10 \
  --exp_name ./exps/CIFAR10 \  # path to where you want to store training stats
  --version [WRN-A1/A2/A3/A4] \  # you may also change it to RobustResNet-A1/A2/A3/A4
  --train \ 
  --data_parallel \
  --apex-amp

Advanced adversarial training

Please download the additional pseudolabeled data from Carmon et al., 2019.

python -m torch.distributed.launch \
  --nproc_per_node=8 --master_port 14226 \  # use a random port number
  adv-main_dist.py \
  --log-dir ./checkpoints/ \  # path to where you want to store training stats
  --config-path ./configs/Advanced_CIFAR10
  --version [WRN-A1/A2/A3/A4] \ 
  --desc drna4-basic-silu-apex-500k \  # name of the folder for storing training stats
  --apex-amp --adv-eval-freq 5 \  # evaluation frequency, will significantly slow down your training if too often
  --start-eval 310 \  # start evaluating after N epochs
  --apex_amp --advnorm --adjust_bn True \
   --num-adv-epochs 400 --batch-size 1024 --lr 0.4 --weight-decay 0.0005 --beta 6.0 \
  --data-dir /datasets/ --data cifar10s \
  --aux-data-filename /datasets/ti_500K_pseudo_labeled.pickle \  # location to where you download the pseudolabeled data
  --unsup-fraction 0.7

Requirements

The code has been implemented and tested with Python 3.8.5, PyTorch 1.8.0, and apex(use for accel).

Part of the code is based on the following repos: