Home

Awesome

This is a fork of the original PyTorch Image Classification

PyTorch Image Classification

Following papers are implemented using PyTorch.

Requirements

Usage

$ ./main.py --arch resnet_preact --depth 56 --outdir results

Use Cutout

$ ./main.py --arch resnet_preact --depth 56 --outdir results --use_cutout

Use RandomErasing

$ ./main.py --arch resnet_preact --depth 56 --outdir results --use_random_erasing

Use Mixup

$ ./main.py --arch resnet_preact --depth 56 --outdir results --use_mixup

Use cosine annealing

$ ./main.py --arch wrn --outdir results --scheduler cosine

Results on Kuzushiji-49

Comparison of models and different batch size

ModelBatch sizeBalanced accuracy# of epochsTraining time
DenseNet-100 (k=12)153696.03100034h27m
DenseNet-100 (k=12)153697.32150047h39m
Shake-Shake-26 2x96d51297.41100047h21m
Shake-Shake-26 2x96d102497.57100041h14m

Comparison of different settings when using Shake-Shake model

ModelBatch sizeBalanced accuracy# of epochsTraining time
Shake-Shake-26 2x96d102497.64110047h25m
Shake-Shake-26 2x96d *204897.72110021h45m
Shake-Shake-26 2x96d *204898.00180034h25m
Shake-Shake-26 2x96d (cutout 14)102498.10110047h3m
Shake-Shake-26 2x96d (mixup alpha=1)102497.42110047h14m
Shake-Shake-26 2x96d (cutout 14) *204898.16110023h27m
Shake-Shake-26 2x96d (cutout 14) *204898.29180036h15m

* run on eight Tesla V100 GPUs; other experiments were run on four Tesla P100 GPUs

Here are the training arguments used to achieve the best balanced accuracy.

python train.py --dataset K49 --arch shake_shake --depth 26 --base_channels 96 --shake_forward True --shake_backward True --shake_image True --seed 7 --outdir results/k49/shake_shake_26_2x96d_cutout14/04 --epochs 1800 --scheduler cosine --base_lr 0.2 --batch_size 2048 --use_cutout --cutout_size 14

Results on CIFAR-10

Results using almost same settings as papers

ModelTest Error (median of 3 runs)Test Error (in paper)Training Time
VGG-like (depth 15, w/ BN, channel 64)7.29N/A1h20m
ResNet-1106.526.43 (best), 6.61 +/- 0.163h06m
ResNet-preact-1106.476.37 (median of 5 runs)3h05m
ResNet-preact-164 bottleneck5.905.46 (median of 5 runs)4h01m
ResNet-preact-1001 bottleneck4.62 (median of 5 runs), 4.69 +/- 0.20
WRN-28-104.034.00 (median of 5 runs)16h10m
WRN-28-10 w/ dropout3.89 (median of 5 runs)
DenseNet-100 (k=12)3.87 (1 run)4.10 (1 run)24h28m*
DenseNet-100 (k=24)3.74 (1 run)
DenseNet-BC-100 (k=12)4.694.51 (1 run)15h20m
DenseNet-BC-250 (k=24)3.62 (1 run)
DenseNet-BC-190 (k=40)3.46 (1 run)
PyramidNet-110 (alpha=84)4.404.26 +/- 0.2311h40m
PyramidNet-110 (alpha=270)3.92 (1 run)3.73 +/- 0.0424h12m*
PyramidNet-164 bottleneck (alpha=270)3.44 (1 run)3.48 +/- 0.2032h37m*
PyramidNet-272 bottleneck (alpha=200)3.31 +/- 0.08
ResNeXt-29 4x64d3.89~3.75 (from Figure 7)31h17m
ResNeXt-29 8x64d3.97 (1 run)3.65 (average of 10 runs)42h50m*
ResNeXt-29 16x64d3.58 (average of 10 runs)
shake-shake-26 2x32d (S-S-I)3.683.55 (average of 3 runs)33h49m
shake-shake-26 2x64d (S-S-I)2.88 (1 run)2.98 (average of 3 runs)78h48m
shake-shake-26 2x96d (S-S-I)2.90 (1 run)2.86 (average of 5 runs)101h32m*

Notes

VGG-like

$ python -u main.py --arch vgg --seed 7 --outdir results/vgg_15_BN_64/00

ResNet

$ python -u main.py --arch resnet --depth 110 --block_type basic --seed 7 --outdir results/resnet_basic_110/00

ResNet-preact

$ python -u main.py --arch resnet_preact --depth 110 --block_type basic --seed 7 --outdir results/resnet_preact_basic_110/00

$ python -u main.py --arch resnet_preact --depth 164 --block_type bottleneck --seed 7 --outdir results/resnet_preact_bottleneck_164/00

WRN

$ python -u main.py --arch wrn --depth 28 --widening_factor 10 --seed 7 --outdir results/wrn_28_10/00

DenseNet

$ python -u main.py --arch densenet --depth 100 --block_type bottleneck --growth_rate 12 --compression_rate 0.5 --batch_size 32 --base_lr 0.05 --seed 7 --outdir results/densenet_BC_100_12/00

PyramidNet

$ python -u main.py --arch pyramidnet --depth 110 --block_type basic --pyramid_alpha 84 --seed 7 --outdir results/pyramidnet_basic_110_84/00

$ python -u main.py --arch pyramidnet --depth 110 --block_type basic --pyramid_alpha 270 --seed 7 --outdir results/pyramidnet_basic_110_270/00

ResNeXt

$ python -u main.py --arch resnext --depth 29 --cardinality 4 --base_channels 64 --batch_size 32 --base_lr 0.025 --seed 7 --outdir results/resnext_29_4x64d/00

$ python -u main.py --arch resnext --depth 29 --cardinality 8 --base_channels 64 --batch_size 64 --base_lr 0.05 --seed 7 --outdir results/resnext_29_8x64d/00

shake-shake

$ python -u main.py --arch shake_shake --depth 26 --base_channels 32 --shake_forward True --shake_backward True --shake_image True --seed 7 --outdir results/shake_shake_26_2x32d_SSI/00

$ python -u main.py --arch shake_shake --depth 26 --base_channels 64 --shake_forward True --shake_backward True --shake_image True --batch_size 64 --base_lr 0.1 --seed 7 --outdir results/shake_shake_26_2x64d_SSI/00

$ python -u main.py --arch shake_shake --depth 26 --base_channels 96 --shake_forward True --shake_backward True --shake_image True --seed 7 --outdir results/shake_shake_26_2x96d_SSI/00

Results

ModelTest Error (1 run)# of EpochsTraining Time
WRN-28-10, Cutout 163.1920016h23m*
WRN-28-10, mixup (alpha=1)3.322006h35m
WRN-28-10, RICAP (beta=0.3)2.832006h35m
WRN-28-10, Dual-Cutout (alpha=0.1)2.8720012h42m
WRN-28-10, Cutout 163.0740013h10m
WRN-28-10, mixup (alpha=1)3.0440013h08m
WRN-28-10, RICAP (beta=0.3)2.7140013h08m
WRN-28-10, Dual-Cutout (alpha=0.1)2.7640025h20m
shake-shake-26 2x64d, Cutout 162.64180078h55m*
shake-shake-26 2x64d, mixup (alpha=1)2.63180035h56m
shake-shake-26 2x64d, RICAP (beta=0.3)2.29180035h10m
shake-shake-26 2x64d, Dual-Cutout (alpha=0.1)1800
shake-shake-26 2x96d, Cutout 162.50180060h20m
shake-shake-26 2x96d, mixup (alpha=1)2.36180060h20m
shake-shake-26 2x96d, RICAP (beta=0.3)2.10180060h20m
shake-shake-26 2x96d, Dual-Cutout (alpha=0.1)2.411800113h09m

Note

python -u main.py --arch wrn --depth 28 --outdir results/wrn_28_10_cutout16 --epochs 200 --scheduler cosine --base_lr 0.1 --batch_size 64 --seed 17 --use_cutout --cutout_size 16

python -u main.py --arch shake_shake --depth 26 --base_channels 64 --outdir results/shake_shake_26_2x64d_SSI_cutout16 --epochs 300 --scheduler cosine --base_lr 0.1 --batch_size 64 --seed 17 --use_cutout --cutout_size 16

Results on FashionMNIST

ModelTest Error (1 run)# of EpochsTraining Time
ResNet-preact-20, widening factor 4, Cutout 124.172001h32m
ResNet-preact-20, widening factor 4, Cutout 144.112001h32m
ResNet-preact-50, Cutout 124.4520057m
ResNet-preact-50, Cutout 144.3820057m
ResNet-preact-50, widening factor 4,Cutout 124.072003h37m
ResNet-preact-50, widening factor 4,Cutout 144.132003h39m
shake-shake-26 2x32d (S-S-I), Cutout 124.084003h41m
shake-shake-26 2x32d (S-S-I), Cutout 144.054003h39m
shake-shake-26 2x96d (S-S-I), Cutout 123.7240013h46m
shake-shake-26 2x96d (S-S-I), Cutout 143.8540013h39m
shake-shake-26 2x96d (S-S-I), Cutout 123.6580026h42m
shake-shake-26 2x96d (S-S-I), Cutout 143.6080026h42m
ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-205.0420026m
ResNet-preact-20, Cutout 64.8420026m
ResNet-preact-20, Cutout 84.6420026m
ResNet-preact-20, Cutout 104.7420026m
ResNet-preact-20, Cutout 124.6820026m
ResNet-preact-20, Cutout 144.6420026m
ResNet-preact-20, Cutout 164.4920026m
ResNet-preact-20, RandomErasing4.6120026m
ResNet-preact-20, Mixup4.9220026m
ResNet-preact-20, Mixup4.6440052m

Note

Results on MNIST

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-200.4010012m
ResNet-preact-20, Cutout 60.3210012m
ResNet-preact-20, Cutout 80.2510012m
ResNet-preact-20, Cutout 100.2710012m
ResNet-preact-20, Cutout 120.2610012m
ResNet-preact-20, Cutout 140.2610012m
ResNet-preact-20, Cutout 160.2510012m
ResNet-preact-20, Mixup (alpha=1)0.4010012m
ResNet-preact-20, Mixup (alpha=0.5)0.3810012m
ResNet-preact-20, widening factor 4, Cutout 140.2610045m
ResNet-preact-50, Cutout 140.2910028m
ResNet-preact-50, widening factor 4, Cutout 140.251001h50m
shake-shake-26 2x96d (S-S-I), Cutout 140.241003h22m

Note

Results on Kuzushiji-MNIST

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-20, Cutout 140.82 (best 0.67)20024m
ResNet-preact-20, widening factor 4, Cutout 140.72 (best 0.67)2001h30m
PyramidNet-110-270, Cutout 140.72 (best 0.70)20010h05m
shake-shake-26 2x96d (S-S-I), Cutout 140.66 (best 0.63)2006h46m

Note

Experiments

Experiment on residual units, learning rate scheduling, and data augmentation

In this experiment, the effects of the following on classification accuracy are investigated:

ResNet-preact-56 is trained on CIFAR-10 with initial learning rate 0.2 in this experiment.

Note

Results

ModelTest Error (median of 5 runs)Training Time
w/ 1st ReLU, w/o last BN, preactivate shortcut after downsampling6.4595 min
w/ 1st ReLU, w/o last BN6.4795 min
w/o 1st ReLU, w/o last BN6.1489 min
w/ 1st ReLU, w/ last BN6.43104 min
w/o 1st ReLU, w/ last BN5.8598 min
w/o 1st ReLU, w/ last BN, preactivate shortcut after downsampling6.2798 min
w/o 1st ReLU, w/ last BN, Cosine annealing5.7298 min
w/o 1st ReLU, w/ last BN, Cutout4.9698 min
w/o 1st ReLU, w/ last BN, RandomErasing5.2298 min
w/o 1st ReLU, w/ last BN, Mixup (300 epochs)5.11191 min
preactivate shortcut after downsampling
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, true, true]' --remove_first_relu false --add_last_bn false --seed 7 --outdir results/experiments/00_preact_after_downsampling/00

w/ 1st ReLU, w/o last BN
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, false, false]' --remove_first_relu false --add_last_bn false --seed 7 --outdir results/experiments/01_w_relu_wo_bn/00

w/o 1st ReLU, w/o last BN
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, false, false]' --remove_first_relu true --add_last_bn false --seed 7 --outdir results/experiments/02_wo_relu_wo_bn/00

w/ 1st ReLU, w/ last BN
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, false, false]' --remove_first_relu false --add_last_bn true --seed 7 --outdir results/experiments/03_w_relu_w_bn/00

w/o 1st ReLU, w/ last BN
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, false, false]' --remove_first_relu true --add_last_bn true --seed 7 --outdir results/experiments/04_wo_relu_w_bn/00

w/o 1st ReLU, w/ last BN, preactivate shortcut after downsampling
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, true, true]' --remove_first_relu true --add_last_bn true --seed 7 --outdir results/experiments/05_preact_after_downsampling/00

w/o 1st ReLU, w/ last BN, cosine annealing
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, false, false]' --remove_first_relu true --add_last_bn true --scheduler cosine --seed 7 --outdir results/experiments/06_cosine_annealing/00

w/o 1st ReLU, w/ last BN, Cutout
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, false, false]' --remove_first_relu true --add_last_bn true --use_cutout --seed 7 --outdir results/experiments/07_cutout/00

w/o 1st ReLU, w/ last BN, RandomErasing
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, false, false]' --remove_first_relu true --add_last_bn true --use_random_erasing --seed 7 --outdir results/experiments/08_random_erasing/00

w/o 1st ReLU, w/ last BN, Mixup
$ python -u main.py --arch resnet_preact --depth 56 --block_type basic --base_lr 0.2 --preact_stage '[true, false, false]' --remove_first_relu true --add_last_bn true --use_mixup --seed 7 --outdir results/experiments/09_mixup/00

Experiments on label smoothing, Mixup, RICAP, and Dual-Cutout

Results on CIFAR-10

ModelTest Error (median of 3 runs)# of EpochsTraining Time
ResNet-preact-207.6020024m
ResNet-preact-20, label smoothing (epsilon=0.001)7.4120025m
ResNet-preact-20, label smoothing (epsilon=0.1)7.5320025m
ResNet-preact-20, mixup (alpha=1)7.2420026m
ResNet-preact-20, RICAP (beta=0.3), w/ random crop6.8820028m
ResNet-preact-20, RICAP (beta=0.3)6.7720028m
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1)6.2420045m
ResNet-preact-207.0540049m
ResNet-preact-20, label smoothing (epsilon=0.001)7.0540049m
ResNet-preact-20, label smoothing (epsilon=0.1)7.1340049m
ResNet-preact-20, mixup (alpha=1)6.6640051m
ResNet-preact-20, RICAP (beta=0.3), w/ random crop6.3040056m
ResNet-preact-20, RICAP (beta=0.3)6.1940056m
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1)5.554001h36m

Note

Experiments on batch size and learning rate

Linear scaling rule for learning rate

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine20010.5722m
ResNet-preact-2020481.6cosine2008.8721m
ResNet-preact-2010240.8cosine2008.4021m
ResNet-preact-205120.4cosine2008.2220m
ResNet-preact-202560.2cosine2008.6122m
ResNet-preact-201280.1cosine2008.0924m
ResNet-preact-20640.05cosine2008.2228m
ResNet-preact-20320.025cosine2008.0043m
ResNet-preact-20160.0125cosine2007.751h17m
ResNet-preact-2080.006125cosine2007.702h32m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2multistep20028.9722m
ResNet-preact-2020481.6multistep2009.0721m
ResNet-preact-2010240.8multistep2008.6221m
ResNet-preact-205120.4multistep2008.2320m
ResNet-preact-202560.2multistep2008.4021m
ResNet-preact-201280.1multistep2008.2824m
ResNet-preact-20640.05multistep2008.1328m
ResNet-preact-20320.025multistep2007.5843m
ResNet-preact-20160.0125multistep2007.931h18m
ResNet-preact-2080.006125multistep2008.312h34m

Linear scaling + longer training

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine4008.9744m
ResNet-preact-2020481.6cosine4007.8543m
ResNet-preact-2010240.8cosine4007.2042m
ResNet-preact-205120.4cosine4007.8340m
ResNet-preact-202560.2cosine4007.6542m
ResNet-preact-201280.1cosine4007.0947m
ResNet-preact-20640.05cosine4007.1744m
ResNet-preact-20320.025cosine4007.242h11m
ResNet-preact-20160.0125cosine4007.264h10m
ResNet-preact-2080.006125cosine4007.027h53m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine8008.141h29m
ResNet-preact-2020481.6cosine8007.741h23m
ResNet-preact-2010240.8cosine8007.151h31m
ResNet-preact-205120.4cosine8007.271h25m
ResNet-preact-202560.2cosine8007.221h26m
ResNet-preact-201280.1cosine8006.681h35m
ResNet-preact-20640.05cosine8007.182h20m
ResNet-preact-20320.025cosine8007.034h16m
ResNet-preact-20160.0125cosine8006.788h37m
ResNet-preact-2080.006125cosine8006.8916h47m

Effect of initial learning rate

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040963.2cosine20010.5722m
ResNet-preact-2040961.6cosine20010.3222m
ResNet-preact-2040960.8cosine20010.7122m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2020483.2cosine20011.3421m
ResNet-preact-2020482.4cosine2008.6921m
ResNet-preact-2020482.0cosine2008.8121m
ResNet-preact-2020481.6cosine2008.7322m
ResNet-preact-2020480.8cosine2009.6221m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2010243.2cosine2009.1221m
ResNet-preact-2010242.4cosine2008.4222m
ResNet-preact-2010242.0cosine2008.3822m
ResNet-preact-2010241.6cosine2008.0722m
ResNet-preact-2010241.2cosine2008.2521m
ResNet-preact-2010240.8cosine2008.0822m
ResNet-preact-2010240.4cosine2008.4922m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-205123.2cosine2008.5121m
ResNet-preact-205121.6cosine2007.7320m
ResNet-preact-205120.8cosine2007.7321m
ResNet-preact-205120.4cosine2008.2220m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-202563.2cosine2009.6422m
ResNet-preact-202561.6cosine2008.3222m
ResNet-preact-202560.8cosine2007.4521m
ResNet-preact-202560.4cosine2007.6822m
ResNet-preact-202560.2cosine2008.6122m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-201281.6cosine2009.0324m
ResNet-preact-201280.8cosine2007.5424m
ResNet-preact-201280.4cosine2007.2824m
ResNet-preact-201280.2cosine2007.9624m
ResNet-preact-201280.1cosine2008.0924m
ResNet-preact-201280.05cosine2008.8124m
ResNet-preact-201280.025cosine20010.0724m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20640.4cosine2007.4235m
ResNet-preact-20640.2cosine2007.5236m
ResNet-preact-20640.1cosine2007.7837m
ResNet-preact-20640.05cosine2008.2228m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-20320.2cosine2007.641h05m
ResNet-preact-20320.1cosine2007.251h08m
ResNet-preact-20320.05cosine2007.451h07m
ResNet-preact-20320.025cosine2008.0043m

Good learning rate + longer training

Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine8008.361h33m
ResNet-preact-2020481.6cosine8007.531h27m
ResNet-preact-2010241.6cosine8007.301h30m
ResNet-preact-2010240.8cosine8007.421h30m
ResNet-preact-205121.6cosine8006.691h26m
ResNet-preact-205120.8cosine8006.771h26m
ResNet-preact-202560.8cosine8006.841h28m
ResNet-preact-201280.4cosine8006.861h35m
ResNet-preact-201280.2cosine8007.051h38m
ResNet-preact-201280.1cosine8006.681h35m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine16008.253h10m
ResNet-preact-2020481.6cosine16007.342h50m
ResNet-preact-2010241.6cosine16006.942h52m
ResNet-preact-205121.6cosine16006.992h44m
ResNet-preact-202560.8cosine16006.952h50m
ResNet-preact-201280.4cosine16006.643h09m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2040961.6cosine32009.526h15m
ResNet-preact-2020481.6cosine32006.925h42m
ResNet-preact-2010241.6cosine32006.965h43m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (1 run)Training Time
ResNet-preact-2020481.6cosine64007.4511h44m

LARS

$ python -u train.py --dataset CIFAR10 --arch resnet_preact --depth 20 --block_type basic --seed 7 --scheduler cosine --optimizer lars --base_lr 0.02 --batch_size 4096 --epochs 200 --outdir results/experiment00/00
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 run)Training Time
ResNet-preact-2040960.005cosine20014.3122m
ResNet-preact-2040960.01cosine2009.3322m
ResNet-preact-2040960.015cosine2008.4722m
ResNet-preact-2040960.02cosine2008.2122m
ResNet-preact-2040960.03cosine2008.4622m
ResNet-preact-2040960.04cosine2009.5822m
Modelbatch sizeinitial lrlr schedule# of EpochsTest Error (median of 3 run)Training Time
ResNet-preact-2040960.02cosine2008.2122m
ResNet-preact-2040960.02cosine4007.5344m
ResNet-preact-2040960.02cosine8007.481h29m
ResNet-preact-2040960.02cosine16007.37 (1 run)2h58m

References

Model architecture

Regularization, data augmentation

Large batch

Others