Home

Awesome

A PyTorch Implementation of CIFAR Tricks

CIFAR10数据集上CNN模型、Transformer模型以及Tricks,数据增强,正则化方法等,并进行了实现。欢迎提issue或者进行PR。

0. Requirements

1. Implements

1.0 Models

vision Transformer:

ModelGPU MemTop1:trainTop1:valweight:M
vision_transformer2869M68.9669.0247.6
mobilevit_s2009M98.8392.5019.2
mobilevit_xs1681M98.2291.777.78
mobilevit_xxs1175M96.4090.174.0
coatnet_01433M99.9490.1564.9
coatnet_12089M99.9790.09123
coatnet_22405M99.9990.86208
cvt2593M94.6484.7475
swin_t3927M93.2486.09104
swin_s6707M90.2783.68184

1.1 Tricks

1.2 Augmentation

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

更多脚本可以参考 scripts/run.sh

3. Results

3.1 原pytorch-ricap的结果

ModelError rateLossError rate (paper)
WideResNet28-10 baseline3.82(96.18)0.1583.89
WideResNet28-10 +RICAP2.82(97.18)0.1412.85
WideResNet28-10 +Random Erasing3.18(96.82)0.1144.65
WideResNet28-10 +Mixup3.02(96.98)0.1583.02

3.2 Reimplementation结果

ModelError rateLossError rate (paper)
WideResNet28-10 baseline3.78(96.22)3.89
WideResNet28-10 +RICAP2.81(97.19)2.85
WideResNet28-10 +Random Erasing3.03(96.97)0.1134.65
WideResNet28-10 +Mixup2.93(97.07)0.1583.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data,epoch200,bs128)

ModelError rateLoss
lenet(cpu爆炸)(70.76)
wideresnet3.78(96.22)
resnet20(89.72)
senet(92.34)
resnet18(92.08)
resnet34(92.48)
resnet50(91.72)
regnet(92.58)
nasnetout of mem
shake_resnet26_2x32d(93.06)
shake_resnet26_2x64d(94.14)
densenet(92.06)
dla(92.58)
googlenet(91.90)0.2675
efficientnetb0(利用率低且慢)(86.82)0.5024
mobilenet(利用率低)(89.18)
mobilenetv2(91.06)
pnasnet(90.44)
preact_resnet(90.76)
resnext(92.30)
vgg(cpugpu利用率都高)(88.38)
inceptionv3(91.84)
inceptionv4(91.10)
inception_resnet_v2(83.46)
rir(92.34)0.3932
squeezenet(CPU利用率高)(89.16)0.4311
stochastic_depth_resnet18(90.22)
xception
dpn(92.06)0.3002
ge_resnext29_8x64d(93.86)巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度,并进行训练,可以得到以下结论:

结论:lenet这种卷积量比较少,只有两层的,cpu利用率高,gpu利用率低。在这个基础上增加深度,用vgg那种直筒方式增加深度,发现深度越深,cpu利用率越低,gpu利用率越高。

修改训练过程的batch size,可以得到以下结论:

结论:bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architectureepochcutoutmixupC10 test acc (%)
shake_resnet26_2x64d20096.33
shake_resnet26_2x64d20096.99
shake_resnet26_2x64d20096.60
shake_resnet26_2x64d20096.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architectureepochSAMASAMCosine LR DecayLabelSmoothC10 test acc (%)
shake_resnet26_2x64d20096.51
shake_resnet26_2x64d20096.80
shake_resnet26_2x64d20096.61
shake_resnet26_2x64d20096.57

PS:其他库在加长训练过程(epoch=1800)情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architectureepochcutoutmixupC10 test acc (%)
shake_resnet26_2x64d30096.66
shake_resnet26_2x64d30097.21
shake_resnet26_2x64d30096.90
shake_resnet26_2x64d30096.73

1800 epoch CIFAR ZOO中结果,由于耗时过久,未进行复现。

architectureepochcutoutmixupC10 test acc (%)
shake_resnet26_2x64d180096.94(cifar zoo)
shake_resnet26_2x64d180097.20(cifar zoo)
shake_resnet26_2x64d180097.42(cifar zoo)
shake_resnet26_2x64d180097.71(cifar zoo)

3.8 Divide and Co-training方案研究

复现:((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architectureepochcutoutmixupautoaugmentrandom-eraseC10 test acc (%)
shake_resnet26_2x64d20096.42
shake_resnet26_2x64d20096.49
shake_resnet26_2x64d20096.17
shake_resnet26_2x64d20096.25
shake_resnet26_2x64d20096.20
shake_resnet26_2x64d20095.82
shake_resnet26_2x64d20096.02
shake_resnet26_2x64d20096.00
shake_resnet26_2x64d20095.83
shake_resnet26_2x64d20095.89
shake_resnet26_2x64d20096.25
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

3.10 测试注意力机制

ModelTop1:trainTop1:valweight:M
spp_d11_pN10086.797.36
spp_d11_pA10085.837.36
spp_d11_pB10085.667.36
spp_d11_pC10085.567.36
spp_d11_pD10085.737.36
spp_d20_pN10090.5913.4
spp_d20_pA10089.9613.4
spp_d20_pB10089.2613.4
spp_d20_pC10089.6913.4
spp_d20_pD10089.9313.4
spp_d29_pN99.9989.5619.4
spp_d29_pA10090.1319.4
spp_d29_pB10090.1619.4
spp_d29_pC10090.0919.4
spp_d29_pD10090.0619.4

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

[8] https://myrtle.ai/learn/how-to-train-your-resnet/