Home

Awesome

PyTorch Image Classification

Following papers are implemented using PyTorch.

Requirements

pip install -r requirements.txt

Usage

python train.py --config configs/cifar/resnet_preact.yaml

Results on CIFAR-10

Results using almost same settings as papers

ModelTest Error (median of 3 runs)Test Error (in paper)Training Time
VGG-like (depth 15, w/ BN, channel 64)7.29N/A1h20m
ResNet-1106.526.43 (best), 6.61 +/- 0.163h06m
ResNet-preact-1106.476.37 (median of 5 runs)3h05m
ResNet-preact-164 bottleneck5.905.46 (median of 5 runs)4h01m
ResNet-preact-1001 bottleneck4.62 (median of 5 runs), 4.69 +/- 0.20
WRN-28-104.034.00 (median of 5 runs)16h10m
WRN-28-10 w/ dropout3.89 (median of 5 runs)
DenseNet-100 (k=12)3.87 (1 run)4.10 (1 run)24h28m*
DenseNet-100 (k=24)3.74 (1 run)
DenseNet-BC-100 (k=12)4.694.51 (1 run)15h20m
DenseNet-BC-250 (k=24)3.62 (1 run)
DenseNet-BC-190 (k=40)3.46 (1 run)
PyramidNet-110 (alpha=84)4.404.26 +/- 0.2311h40m
PyramidNet-110 (alpha=270)3.92 (1 run)3.73 +/- 0.0424h12m*
PyramidNet-164 bottleneck (alpha=270)3.44 (1 run)3.48 +/- 0.2032h37m*
PyramidNet-272 bottleneck (alpha=200)3.31 +/- 0.08
ResNeXt-29 4x64d3.89~3.75 (from Figure 7)31h17m
ResNeXt-29 8x64d3.97 (1 run)3.65 (average of 10 runs)42h50m*
ResNeXt-29 16x64d3.58 (average of 10 runs)
shake-shake-26 2x32d (S-S-I)3.683.55 (average of 3 runs)33h49m
shake-shake-26 2x64d (S-S-I)2.88 (1 run)2.98 (average of 3 runs)78h48m
shake-shake-26 2x96d (S-S-I)2.90 (1 run)2.86 (average of 5 runs)101h32m*

Notes

VGG-like

python train.py --config configs/cifar/vgg.yaml

ResNet

python train.py --config configs/cifar/resnet.yaml

ResNet-preact

python train.py --config configs/cifar/resnet_preact.yaml \
    train.output_dir experiments/resnet_preact_basic_110/exp00

python train.py --config configs/cifar/resnet_preact.yaml \
    model.resnet_preact.depth 164 \
    model.resnet_preact.block_type bottleneck \
    train.output_dir experiments/resnet_preact_bottleneck_164/exp00

WRN

python train.py --config configs/cifar/wrn.yaml

DenseNet

python train.py --config configs/cifar/densenet.yaml

PyramidNet

python train.py --config configs/cifar/pyramidnet.yaml \
    model.pyramidnet.depth 110 \
    model.pyramidnet.block_type basic \
    model.pyramidnet.alpha 84 \
    train.output_dir experiments/pyramidnet_basic_110_84/exp00

python train.py --config configs/cifar/pyramidnet.yaml \
    model.pyramidnet.depth 110 \
    model.pyramidnet.block_type basic \
    model.pyramidnet.alpha 270 \
    train.output_dir experiments/pyramidnet_basic_110_270/exp00

ResNeXt

python train.py --config configs/cifar/resnext.yaml \
    model.resnext.cardinality 4 \
    train.batch_size 32 \
    train.base_lr 0.025 \
    train.output_dir experiments/resnext_29_4x64d/exp00

python train.py --config configs/cifar/resnext.yaml \
    train.batch_size 64 \
    train.base_lr 0.05 \
    train.output_dir experiments/resnext_29_8x64d/exp00

shake-shake

python train.py --config configs/cifar/shake_shake.yaml \
    model.shake_shake.initial_channels 32 \
    train.output_dir experiments/shake_shake_26_2x32d_SSI/exp00

python train.py --config configs/cifar/shake_shake.yaml \
    model.shake_shake.initial_channels 64 \
    train.batch_size 64 \
    train.base_lr 0.1 \
    train.output_dir experiments/shake_shake_26_2x64d_SSI/exp00

python train.py --config configs/cifar/shake_shake.yaml \
    model.shake_shake.initial_channels 96 \
    train.batch_size 64 \
    train.base_lr 0.1 \
    train.output_dir experiments/shake_shake_26_2x96d_SSI/exp00

Results

ModelTest Error (1 run)# of EpochsTraining Time
ResNet-preact-20, widening factor 44.912001h26m
ResNet-preact-20, widening factor 44.014002h53m
ResNet-preact-20, widening factor 43.99180012h53m
ResNet-preact-20, widening factor 4, Cutout 163.712001h26m
ResNet-preact-20, widening factor 4, Cutout 163.464002h53m
ResNet-preact-20, widening factor 4, Cutout 163.76180012h53m
ResNet-preact-20, widening factor 4, RICAP (beta=0.3)3.452001h26m
ResNet-preact-20, widening factor 4, RICAP (beta=0.3)3.114002h53m
ResNet-preact-20, widening factor 4, RICAP (beta=0.3)3.15180012h53m
ModelTest Error (1 run)# of EpochsTraining Time
WRN-28-10, Cutout 163.192006h35m
WRN-28-10, mixup (alpha=1)3.322006h35m
WRN-28-10, RICAP (beta=0.3)2.832006h35m
WRN-28-10, Dual-Cutout (alpha=0.1)2.8720012h42m
WRN-28-10, Cutout 163.0740013h10m
WRN-28-10, mixup (alpha=1)3.0440013h08m
WRN-28-10, RICAP (beta=0.3)2.7140013h08m
WRN-28-10, Dual-Cutout (alpha=0.1)2.7640025h20m
shake-shake-26 2x64d, Cutout 162.64180078h55m*
shake-shake-26 2x64d, mixup (alpha=1)2.63180035h56m
shake-shake-26 2x64d, RICAP (beta=0.3)2.29180035h10m
shake-shake-26 2x64d, Dual-Cutout (alpha=0.1)2.64180068h34m
shake-shake-26 2x96d, Cutout 162.50180060h20m
shake-shake-26 2x96d, mixup (alpha=1)2.36180060h20m
shake-shake-26 2x96d, RICAP (beta=0.3)2.10180060h20m
shake-shake-26 2x96d, Dual-Cutout (alpha=0.1)2.411800113h09m
shake-shake-26 2x128d, Cutout 162.58180085h04m
shake-shake-26 2x128d, RICAP (beta=0.3)1.97180085h06m

Note

python train.py --config configs/cifar/wrn.yaml \
    train.batch_size 64 \
    train.output_dir experiments/wrn_28_10_cutout16 \
    scheduler.type cosine \
    augmentation.use_cutout True

python train.py --config configs/cifar/shake_shake.yaml \
    model.shake_shake.initial_channels 64 \
    train.batch_size 64 \
    train.base_lr 0.1 \
    scheduler.epochs 300 \
    train.output_dir experiments/shake_shake_26_2x64d_SSI_cutout16/exp00 \
    augmentation.use_cutout True

Results using multi-GPU

Modelbatch size#GPUsTest Error (1 run)# of EpochsTraining Time*
WRN-28-10, RICAP (beta=0.3)51212.632003h41m
WRN-28-10, RICAP (beta=0.3)25622.712002h14m
WRN-28-10, RICAP (beta=0.3)12842.892001h01m
WRN-28-10, RICAP (beta=0.3)6482.7520034m

Note

Using 1 GPU
python train.py --config configs/cifar/wrn.yaml \
    train.base_lr 0.2 \
    train.batch_size 512 \
    scheduler.epochs 200 \
    scheduler.type cosine \
    train.output_dir experiments/wrn_28_10_ricap_1gpu/exp00 \
    augmentation.use_ricap True \
    augmentation.use_random_crop False
Using 2 GPUs
python -m torch.distributed.launch --nproc_per_node 2 \
    train.py --config configs/cifar/wrn.yaml \
    train.distributed True \
    train.base_lr 0.2 \
    train.batch_size 256 \
    scheduler.epochs 200 \
    scheduler.type cosine \
    train.output_dir experiments/wrn_28_10_ricap_2gpus/exp00 \
    augmentation.use_ricap True \
    augmentation.use_random_crop False
Using 4 GPUs
python -m torch.distributed.launch --nproc_per_node 4 \
    train.py --config configs/cifar/wrn.yaml \
    train.distributed True \
    train.base_lr 0.2 \
    train.batch_size 128 \
    scheduler.epochs 200 \
    scheduler.type cosine \
    train.output_dir experiments/wrn_28_10_ricap_4gpus/exp00 \
    augmentation.use_ricap True \
    augmentation.use_random_crop False
Using 8 GPUs
python -m torch.distributed.launch --nproc_per_node 8 \
    train.py --config configs/cifar/wrn.yaml \
    train.distributed True \
    train.base_lr 0.2 \
    train.batch_size 64 \
    scheduler.epochs 200 \
    scheduler.type cosine \
    train.output_dir experiments/wrn_28_10_ricap_8gpus/exp00 \
    augmentation.use_ricap True \
    augmentation.use_random_crop False

Results on FashionMNIST

ModelTest Error (1 run)# of EpochsTraining Time
ResNet-preact-20, widening factor 4, Cutout 124.172001h32m
ResNet-preact-20, widening factor 4, Cutout 144.112001h32m
ResNet-preact-50, Cutout 124.4520057m
ResNet-preact-50, Cutout 144.3820057m
ResNet-preact-50, widening factor 4,Cutout 124.072003h37m
ResNet-preact-50, widening factor 4,Cutout 144.132003h39m
shake-shake-26 2x32d (S-S-I), Cutout 124.084003h41m
shake-shake-26 2x32d (S-S-I), Cutout 144.054003h39m
shake-shake-26 2x96d (S-S-I), Cutout 123.7240013h46m
shake-shake-26 2x96d (S-S-I), Cutout 143.8540013h39m
shake-shake-26 2x96d (S-S-I), Cutout 123.6580026h42m
shake-shake-26 2x96d (S-S-I), Cutout 143.6080026h42m
ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-205.0420026m
ResNet-preact-20, Cutout 64.8420026m
ResNet-preact-20, Cutout 84.6420026m
ResNet-preact-20, Cutout 104.7420026m
ResNet-preact-20, Cutout 124.6820026m
ResNet-preact-20, Cutout 144.6420026m
ResNet-preact-20, Cutout 164.4920026m
ResNet-preact-20, RandomErasing4.6120026m
ResNet-preact-20, Mixup4.9220026m
ResNet-preact-20, Mixup4.6440052m

Note

Results on MNIST

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-200.4010012m
ResNet-preact-20, Cutout 60.3210012m
ResNet-preact-20, Cutout 80.2510012m
ResNet-preact-20, Cutout 100.2710012m
ResNet-preact-20, Cutout 120.2610012m
ResNet-preact-20, Cutout 140.2610012m
ResNet-preact-20, Cutout 160.2510012m
ResNet-preact-20, Mixup (alpha=1)0.4010012m
ResNet-preact-20, Mixup (alpha=0.5)0.3810012m
ResNet-preact-20, widening factor 4, Cutout 140.2610045m
ResNet-preact-50, Cutout 140.2910028m
ResNet-preact-50, widening factor 4, Cutout 140.251001h50m
shake-shake-26 2x96d (S-S-I), Cutout 140.241003h22m

Note

Results on Kuzushiji-MNIST

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-20, Cutout 140.82 (best 0.67)20024m
ResNet-preact-20, widening factor 4, Cutout 140.72 (best 0.67)2001h30m
PyramidNet-110-270, Cutout 140.72 (best 0.70)20010h05m
shake-shake-26 2x96d (S-S-I), Cutout 140.66 (best 0.63)2006h46m

Note

Experiments

Experiment on residual units, learning rate scheduling, and data augmentation

In this experiment, the effects of the following on classification accuracy are investigated:

ResNet-preact-56 is trained on CIFAR-10 with initial learning rate 0.2 in this experiment.

Note

Results

ModelTest Error (median of 5 runs)Training Time
w/ 1st ReLU, w/o last BN, preactivate shortcut after downsampling6.4595 min
w/ 1st ReLU, w/o last BN6.4795 min
w/o 1st ReLU, w/o last BN6.1489 min
w/ 1st ReLU, w/ last BN6.43104 min
w/o 1st ReLU, w/ last BN5.8598 min
w/o 1st ReLU, w/ last BN, preactivate shortcut after downsampling6.2798 min
w/o 1st ReLU, w/ last BN, Cosine annealing5.7298 min
w/o 1st ReLU, w/ last BN, Cutout4.9698 min
w/o 1st ReLU, w/ last BN, RandomErasing5.2298 min
w/o 1st ReLU, w/ last BN, Mixup (300 epochs)5.11191 min
preactivate shortcut after downsampling
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, True, True]' \
    model.resnet_preact.remove_first_relu False \
    model.resnet_preact.add_last_bn False \
    train.output_dir experiments/resnet_preact_after_downsampling/exp00

w/ 1st ReLU, w/o last BN
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, False, False]' \
    model.resnet_preact.remove_first_relu False \
    model.resnet_preact.add_last_bn False \
    train.output_dir experiments/resnet_preact_w_relu_wo_bn/exp00

w/o 1st ReLU, w/o last BN
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, False, False]' \
    model.resnet_preact.remove_first_relu True \
    model.resnet_preact.add_last_bn False \
    train.output_dir experiments/resnet_preact_wo_relu_wo_bn/exp00

w/ 1st ReLU, w/ last BN
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, False, False]' \
    model.resnet_preact.remove_first_relu False \
    model.resnet_preact.add_last_bn True \
    train.output_dir experiments/resnet_preact_w_relu_w_bn/exp00

w/o 1st ReLU, w/ last BN
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, False, False]' \
    model.resnet_preact.remove_first_relu True \
    model.resnet_preact.add_last_bn True \
    train.output_dir experiments/resnet_preact_wo_relu_w_bn/exp00

w/o 1st ReLU, w/ last BN, preactivate shortcut after downsampling
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, True, True]' \
    model.resnet_preact.remove_first_relu True \
    model.resnet_preact.add_last_bn True \
    train.output_dir experiments/resnet_preact_after_downsampling_wo_relu_w_bn/exp00

w/o 1st ReLU, w/ last BN, cosine annealing
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, False, False]' \
    model.resnet_preact.remove_first_relu True \
    model.resnet_preact.add_last_bn True \
    scheduler.type cosine \
    train.output_dir experiments/resnet_preact_wo_relu_w_bn_cosine/exp00

w/o 1st ReLU, w/ last BN, Cutout
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, False, False]' \
    model.resnet_preact.remove_first_relu True \
    model.resnet_preact.add_last_bn True \
    augmentation.use_cutout True \
    train.output_dir experiments/resnet_preact_wo_relu_w_bn_cutout/exp00

w/o 1st ReLU, w/ last BN, RandomErasing
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, False, False]' \
    model.resnet_preact.remove_first_relu True \
    model.resnet_preact.add_last_bn True \
    augmentation.use_random_erasing True \
    train.output_dir experiments/resnet_preact_wo_relu_w_bn_random_erasing/exp00

w/o 1st ReLU, w/ last BN, Mixup
python train.py --config configs/cifar/resnet_preact.yaml \
    train.base_lr 0.2 \
    model.resnet_preact.depth 56 \
    model.resnet_preact.preact_stage '[True, False, False]' \
    model.resnet_preact.remove_first_relu True \
    model.resnet_preact.add_last_bn True \
    augmentation.use_mixup True \
    train.output_dir experiments/resnet_preact_wo_relu_w_bn_mixup/exp00

Experiments on label smoothing, Mixup, RICAP, and Dual-Cutout

Results on CIFAR-10

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-207.6020024m
ResNet-preact-20, label smoothing (epsilon=0.001)7.5120025m
ResNet-preact-20, label smoothing (epsilon=0.01)7.2120025m
ResNet-preact-20, label smoothing (epsilon=0.1)7.5720025m
ResNet-preact-20, mixup (alpha=1)7.2420026m
ResNet-preact-20, RICAP (beta=0.3), w/ random crop6.8820028m
ResNet-preact-20, RICAP (beta=0.3)6.7720028m
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1)6.2420045m
ResNet-preact-207.0540049m
ResNet-preact-20, label smoothing (epsilon=0.001)7.2040049m
ResNet-preact-20, label smoothing (epsilon=0.01)6.9740049m
ResNet-preact-20, label smoothing (epsilon=0.1)7.1640049m
ResNet-preact-20, mixup (alpha=1)6.6640051m
ResNet-preact-20, RICAP (beta=0.3), w/ random crop6.3040056m
ResNet-preact-20, RICAP (beta=0.3)6.1940056m
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1)5.554001h36m

Note

Experiments on batch size and learning rate

Linear scaling rule for learning rate

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine20010.5722m
ResNet-preact-2020481.6cosine2008.8721m
ResNet-preact-2010240.8cosine2008.4021m
ResNet-preact-205120.4cosine2008.2220m
ResNet-preact-202560.2cosine2008.6122m
ResNet-preact-201280.1cosine2008.0924m
ResNet-preact-20640.05cosine2008.2228m
ResNet-preact-20320.025cosine2008.0043m
ResNet-preact-20160.0125cosine2007.751h17m
ResNet-preact-2080.006125cosine2007.702h32m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2multistep20028.9722m
ResNet-preact-2020481.6multistep2009.0721m
ResNet-preact-2010240.8multistep2008.6221m
ResNet-preact-205120.4multistep2008.2320m
ResNet-preact-202560.2multistep2008.4021m
ResNet-preact-201280.1multistep2008.2824m
ResNet-preact-20640.05multistep2008.1328m
ResNet-preact-20320.025multistep2007.5843m
ResNet-preact-20160.0125multistep2007.931h18m
ResNet-preact-2080.006125multistep2008.312h34m

Linear scaling + longer training

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine4008.9744m
ResNet-preact-2020481.6cosine4007.8543m
ResNet-preact-2010240.8cosine4007.2042m
ResNet-preact-205120.4cosine4007.8340m
ResNet-preact-202560.2cosine4007.6542m
ResNet-preact-201280.1cosine4007.0947m
ResNet-preact-20640.05cosine4007.1744m
ResNet-preact-20320.025cosine4007.242h11m
ResNet-preact-20160.0125cosine4007.264h10m
ResNet-preact-2080.006125cosine4007.027h53m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine8008.141h29m
ResNet-preact-2020481.6cosine8007.741h23m
ResNet-preact-2010240.8cosine8007.151h31m
ResNet-preact-205120.4cosine8007.271h25m
ResNet-preact-202560.2cosine8007.221h26m
ResNet-preact-201280.1cosine8006.681h35m
ResNet-preact-20640.05cosine8007.182h20m
ResNet-preact-20320.025cosine8007.034h16m
ResNet-preact-20160.0125cosine8006.788h37m
ResNet-preact-2080.006125cosine8006.8916h47m

Effect of initial learning rate

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine20010.5722m
ResNet-preact-2040961.6cosine20010.3222m
ResNet-preact-2040960.8cosine20010.7122m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2020483.2cosine20011.3421m
ResNet-preact-2020482.4cosine2008.6921m
ResNet-preact-2020482.0cosine2008.8121m
ResNet-preact-2020481.6cosine2008.7322m
ResNet-preact-2020480.8cosine2009.6221m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2010243.2cosine2009.1221m
ResNet-preact-2010242.4cosine2008.4222m
ResNet-preact-2010242.0cosine2008.3822m
ResNet-preact-2010241.6cosine2008.0722m
ResNet-preact-2010241.2cosine2008.2521m
ResNet-preact-2010240.8cosine2008.0822m
ResNet-preact-2010240.4cosine2008.4922m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-205123.2cosine2008.5121m
ResNet-preact-205121.6cosine2007.7320m
ResNet-preact-205120.8cosine2007.7321m
ResNet-preact-205120.4cosine2008.2220m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-202563.2cosine2009.6422m
ResNet-preact-202561.6cosine2008.3222m
ResNet-preact-202560.8cosine2007.4521m
ResNet-preact-202560.4cosine2007.6822m
ResNet-preact-202560.2cosine2008.6122m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-201281.6cosine2009.0324m
ResNet-preact-201280.8cosine2007.5424m
ResNet-preact-201280.4cosine2007.2824m
ResNet-preact-201280.2cosine2007.9624m
ResNet-preact-201280.1cosine2008.0924m
ResNet-preact-201280.05cosine2008.8124m
ResNet-preact-201280.025cosine20010.0724m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20640.4cosine2007.4235m
ResNet-preact-20640.2cosine2007.5236m
ResNet-preact-20640.1cosine2007.7837m
ResNet-preact-20640.05cosine2008.2228m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20320.2cosine2007.641h05m
ResNet-preact-20320.1cosine2007.251h08m
ResNet-preact-20320.05cosine2007.451h07m
ResNet-preact-20320.025cosine2008.0043m

Good learning rate + longer training

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine20010.3222m
ResNet-preact-2020481.6cosine2008.7322m
ResNet-preact-2010241.6cosine2008.0722m
ResNet-preact-2010240.8cosine2008.0822m
ResNet-preact-205121.6cosine2007.7320m
ResNet-preact-205120.8cosine2007.7321m
ResNet-preact-202560.8cosine2007.4521m
ResNet-preact-201280.4cosine2007.2824m
ResNet-preact-201280.2cosine2007.9624m
ResNet-preact-201280.1cosine2008.0924m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine8008.361h33m
ResNet-preact-2020481.6cosine8007.531h27m
ResNet-preact-2010241.6cosine8007.301h30m
ResNet-preact-2010240.8cosine8007.421h30m
ResNet-preact-205121.6cosine8006.691h26m
ResNet-preact-205120.8cosine8006.771h26m
ResNet-preact-202560.8cosine8006.841h28m
ResNet-preact-201280.4cosine8006.861h35m
ResNet-preact-201280.2cosine8007.051h38m
ResNet-preact-201280.1cosine8006.681h35m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine16008.253h10m
ResNet-preact-2020481.6cosine16007.342h50m
ResNet-preact-2010241.6cosine16006.942h52m
ResNet-preact-205121.6cosine16006.992h44m
ResNet-preact-202560.8cosine16006.952h50m
ResNet-preact-201280.4cosine16006.643h09m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine32009.526h15m
ResNet-preact-2020481.6cosine32006.925h42m
ResNet-preact-2010241.6cosine32006.965h43m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2020481.6cosine64007.4511h44m

LARS

python train.py --config configs/cifar/resnet_preact.yaml \
    model.resnet_preact.depth 20 \
    train.optimizer lars \
    train.base_lr 0.02 \
    train.batch_size 4096 \
    scheduler.type cosine \
    train.output_dir experiments/resnet_preact_lars/exp00

Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD40963.2cosine20010.57 (1 run)22m
ResNet-preact-20SGD40961.6cosine20010.2022m
ResNet-preact-20SGD40960.8cosine20010.71 (1 run)22m
ResNet-preact-20LARS40960.04cosine2009.5822m
ResNet-preact-20LARS40960.03cosine2008.4622m
ResNet-preact-20LARS40960.02cosine2008.2122m
ResNet-preact-20LARS40960.015cosine2008.4722m
ResNet-preact-20LARS40960.01cosine2009.3322m
ResNet-preact-20LARS40960.005cosine20014.3122m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD20483.2cosine20011.34 (1 run)21m
ResNet-preact-20SGD20482.4cosine2008.69 (1 run)21m
ResNet-preact-20SGD20482.0cosine2008.81 (1 run)21m
ResNet-preact-20SGD20481.6cosine2008.73 (1 run)22m
ResNet-preact-20SGD20480.8cosine2009.62 (1 run)21m
ResNet-preact-20LARS20480.04cosine20011.5821m
ResNet-preact-20LARS20480.02cosine2008.0522m
ResNet-preact-20LARS20480.01cosine2008.0722m
ResNet-preact-20LARS20480.005cosine2009.6522m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD10243.2cosine2009.12 (1 run)21m
ResNet-preact-20SGD10242.4cosine2008.42 (1 run)22m
ResNet-preact-20SGD10242.0cosine2008.38 (1 run)22m
ResNet-preact-20SGD10241.6cosine2008.07 (1 run)22m
ResNet-preact-20SGD10241.2cosine2008.25 (1 run)21m
ResNet-preact-20SGD10240.8cosine2008.08 (1 run)22m
ResNet-preact-20SGD10240.4cosine2008.49 (1 run)22m
ResNet-preact-20LARS10240.02cosine2009.3022m
ResNet-preact-20LARS10240.01cosine2007.6822m
ResNet-preact-20LARS10240.005cosine2008.8823m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD5123.2cosine2008.51 (1 run)21m
ResNet-preact-20SGD5121.6cosine2007.73 (1 run)20m
ResNet-preact-20SGD5120.8cosine2007.73 (1 run)21m
ResNet-preact-20SGD5120.4cosine2008.22 (1 run)20m
ResNet-preact-20LARS5120.015cosine2009.8423m
ResNet-preact-20LARS5120.01cosine2008.0523m
ResNet-preact-20LARS5120.0075cosine2007.5823m
ResNet-preact-20LARS5120.005cosine2007.9623m
ResNet-preact-20LARS5120.0025cosine2008.8323m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD2563.2cosine2009.64 (1 run)22m
ResNet-preact-20SGD2561.6cosine2008.32 (1 run)22m
ResNet-preact-20SGD2560.8cosine2007.45 (1 run)21m
ResNet-preact-20SGD2560.4cosine2007.68 (1 run)22m
ResNet-preact-20SGD2560.2cosine2008.61 (1 run)22m
ResNet-preact-20LARS2560.01cosine2008.9527m
ResNet-preact-20LARS2560.005cosine2007.7528m
ResNet-preact-20LARS2560.0025cosine2008.2128m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD1281.6cosine2009.03 (1 run)24m
ResNet-preact-20SGD1280.8cosine2007.54 (1 run)24m
ResNet-preact-20SGD1280.4cosine2007.28 (1 run)24m
ResNet-preact-20SGD1280.2cosine2007.96 (1 run)24m
ResNet-preact-20LARS1280.005cosine2007.9637m
ResNet-preact-20LARS1280.0025cosine2007.9837m
ResNet-preact-20LARS1280.00125cosine2009.2137m
Modeloptimizerbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20SGD40961.6cosine20010.2022m
ResNet-preact-20SGD40961.6cosine8008.36 (1 run)1h33m
ResNet-preact-20SGD40961.6cosine16008.25 (1 run)3h10m
ResNet-preact-20LARS40960.02cosine2008.2122m
ResNet-preact-20LARS40960.02cosine4007.5344m
ResNet-preact-20LARS40960.02cosine8007.481h29m
ResNet-preact-20LARS40960.02cosine16007.37 (1 run)2h58m

Ghost BN

python train.py --config configs/cifar/resnet_preact.yaml \
    model.resnet_preact.depth 20 \
    train.base_lr 1.5 \
    train.batch_size 4096 \
    train.subdivision 32 \
    scheduler.type cosine \
    train.output_dir experiments/resnet_preact_ghost_batch/exp00
Modelbatch sizeghost batch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-208192N/A1.6cosine20012.3525m*
ResNet-preact-204096N/A1.6cosine20010.3222m
ResNet-preact-202048N/A1.6cosine2008.7322m
ResNet-preact-201024N/A1.6cosine2008.0722m
ResNet-preact-20128N/A0.4cosine2007.2824m
Modelbatch sizeghost batch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2081921281.6cosine20011.5127m
ResNet-preact-2040961281.6cosine2009.7325m
ResNet-preact-2020481281.6cosine2008.7724m
ResNet-preact-2010241281.6cosine2007.8222m
Modelbatch sizeghost batch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-208192N/A1.6cosine1600
ResNet-preact-204096N/A1.6cosine16008.253h10m
ResNet-preact-202048N/A1.6cosine16007.342h50m
ResNet-preact-201024N/A1.6cosine16006.942h52m
Modelbatch sizeghost batch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2081921281.6cosine160011.833h37m
ResNet-preact-2040961281.6cosine16008.953h15m
ResNet-preact-2020481281.6cosine16007.233h05m
ResNet-preact-2010241281.6cosine16007.082h59m

No weight decay on BN

python train.py --config configs/cifar/resnet_preact.yaml \
    model.resnet_preact.depth 20 \
    train.base_lr 1.6 \
    train.batch_size 4096 \
    train.no_weight_decay_on_bn True \
    train.weight_decay 5e-4 \
    scheduler.type cosine \
    train.output_dir experiments/resnet_preact_no_weight_decay_on_bn/exp00

Modelweight decay on BNweight decaybatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20yes5e-440961.6cosine20010.8122m
ResNet-preact-20yes4e-440961.6cosine20010.8822m
ResNet-preact-20yes3e-440961.6cosine20010.9622m
ResNet-preact-20yes2e-440961.6cosine2009.3022m
ResNet-preact-20yes1e-440961.6cosine20010.2022m
ResNet-preact-20no5e-440961.6cosine2008.7822m
ResNet-preact-20no4e-440961.6cosine2009.8322m
ResNet-preact-20no3e-440961.6cosine2009.9022m
ResNet-preact-20no2e-440961.6cosine2009.6422m
ResNet-preact-20no1e-440961.6cosine20010.3822m
Modelweight decay on BNweight decaybatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20yes5e-420481.6cosine2008.4620m
ResNet-preact-20yes4e-420481.6cosine2008.3520m
ResNet-preact-20yes3e-420481.6cosine2007.7620m
ResNet-preact-20yes2e-420481.6cosine2008.0920m
ResNet-preact-20yes1e-420481.6cosine2008.8320m
ResNet-preact-20no5e-420481.6cosine2008.4920m
ResNet-preact-20no4e-420481.6cosine2007.9820m
ResNet-preact-20no3e-420481.6cosine2008.2620m
ResNet-preact-20no2e-420481.6cosine2008.4720m
ResNet-preact-20no1e-420481.6cosine2009.2720m
Modelweight decay on BNweight decaybatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 runs)Training Time
ResNet-preact-20yes5e-410241.6cosine2008.4521m
ResNet-preact-20yes4e-410241.6cosine2007.9121m
ResNet-preact-20yes3e-410241.6cosine2007.8121m
ResNet-preact-20yes2e-410241.6cosine2007.6921m
ResNet-preact-20yes1e-410241.6cosine2008.2621m
ResNet-preact-20no5e-410241.6cosine2008.0821m
ResNet-preact-20no4e-410241.6cosine2007.7321m
ResNet-preact-20no3e-410241.6cosine2007.9221m
ResNet-preact-20no2e-410241.6cosine2007.9321m
ResNet-preact-20no1e-410241.6cosine2008.5321m

Experiments on half-precision, and mixed-precision

FP16 training

python train.py --config configs/cifar/resnet_preact.yaml \
    model.resnet_preact.depth 20 \
    train.base_lr 1.6 \
    train.batch_size 4096 \
    train.precision O3 \
    scheduler.type cosine \
    train.output_dir experiments/resnet_preact_fp16/exp00

Mixed-precision training

python train.py --config configs/cifar/resnet_preact.yaml \
    model.resnet_preact.depth 20 \
    train.base_lr 1.6 \
    train.batch_size 4096 \
    train.precision O1 \
    scheduler.type cosine \
    train.output_dir experiments/resnet_preact_mixed_precision/exp00

Results

Modelprecisionbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20FP3281921.6cosine200
ResNet-preact-20FP3240961.6cosine20010.3222m
ResNet-preact-20FP3220481.6cosine2008.7322m
ResNet-preact-20FP3210241.6cosine2008.0722m
ResNet-preact-20FP325120.8cosine2007.7321m
ResNet-preact-20FP322560.8cosine2007.4521m
ResNet-preact-20FP321280.4cosine2007.2824m
Modelprecisionbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20FP1681921.6cosine20048.5233m
ResNet-preact-20FP1640961.6cosine20049.8428m
ResNet-preact-20FP1620481.6cosine20075.6327m
ResNet-preact-20FP1610241.6cosine20019.0927m
ResNet-preact-20FP165120.8cosine2007.8926m
ResNet-preact-20FP162560.8cosine2007.4028m
ResNet-preact-20FP161280.4cosine2007.5932m
Modelprecisionbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20mixed81921.6cosine20011.7828m
ResNet-preact-20mixed40961.6cosine20010.4827m
ResNet-preact-20mixed20481.6cosine2008.9826m
ResNet-preact-20mixed10241.6cosine2008.0526m
ResNet-preact-20mixed5120.8cosine2007.8128m
ResNet-preact-20mixed2560.8cosine2007.5832m
ResNet-preact-20mixed1280.4cosine2007.3741m

Results using Tesla V100

Modelprecisionbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20FP3281921.6cosine20012.3525m
ResNet-preact-20FP3240961.6cosine2009.8819m
ResNet-preact-20FP3220481.6cosine2008.8717m
ResNet-preact-20FP3210241.6cosine2008.4518m
ResNet-preact-20mixed81921.6cosine20011.9225m
ResNet-preact-20mixed40961.6cosine20010.1619m
ResNet-preact-20mixed20481.6cosine2009.1017m
ResNet-preact-20mixed10241.6cosine2007.8416m

References

Model architecture

Regularization, data augmentation

Large batch

Others