Home

Awesome

SuperMix: Supervising the Mixing Data Augmentation

<p></p>

Pytorch implementation of SuperMix paper, a supervised method for data augmentation (will appear in CVPR 2021).

Run SuperMix

Run on the ImageNet data

  1. Run supermix.py
python3 supermix.py --dataset imagenet --model resnet34 --save_dir ./outputdir --bs 16 --aug_size 50000 --w 16 --sigma 2
  1. Sample outputs
<p align="center"> <img src="https://github.com/alldbi/KDA/blob/master/examples/imagenet.png"> </p>

Run on the CIFAR-100 data

  1. Download the pretrained model by:
sh scripts/fetch_pretrained_teachers.sh

which saves the models to save/models

  1. Run supermix.py
python3 supermix.py --dataset cifar100 --model resnet110 --save_dir ./outputdir --bs 64 --aug_size 50000 --w 8 --sigma 1
  1. Sample outputs
<p align="center"> <img src="https://github.com/alldbi/KDA/blob/master/examples/cifar100.png"> </p>

Evaluating SuperMix for knowledge distillation and object classification

Code for the distillation is forked/copied from the official code of CRD

  1. Fetch the pretrained teacher models by:

    sh scripts/fetch_pretrained_teachers.sh
    

    which will download and save the models to save/models

  2. Produce augmented data using SuperMix by:

    python3 supermix.py --dataset cifar100 --model resnet110 --save_dir ./output --bs 128 --aug_size 500000 --w 8 --sigma 1
    
  3. Run the distillation

  1. (optional) Train teacher networks from scratch. Example commands are in scripts/run_cifar_vanilla.sh

Note: the default setting is for a single-GPU training. If you would like to play this repo with multiple GPUs, you might need to tune the learning rate, which empirically needs to be scaled up linearly with the batch size, see this paper

Benchmark Results on CIFAR-100:

Performance is measured by classification accuracy (%)

Teacher <br> Studentwrn-40-2 <br> wrn-16-2wrn-40-2 <br> wrn-40-1resnet56 <br> resnet20resnet110 <br> resnet20resnet110 <br> resnet32resnet32x4 <br> resnet8x4vgg13 <br> vgg8
Teacher <br> Student75.61 <br> 73.2675.61 <br> 71.9872.34 <br> 69.0674.31 <br> 69.0674.31 <br> 71.1479.42 <br> 72.5074.64 <br> 70.36
KD74.9273.5470.6670.6773.0873.3372.98
FitNet73.5872.2469.2168.9971.0673.5071.02
AT74.0872.7770.5570.2272.3173.4471.43
SP73.8372.4369.6770.0472.6972.9472.68
CC73.5672.2169.6369.4871.4872.9770.71
VID74.1173.3070.3870.1672.6173.0971.23
RKD73.3572.2269.6169.2571.8271.9071.48
PKT74.5473.4570.3470.2572.6173.6472.88
AB72.5072.3869.4769.5370.9873.1770.94
FT73.2571.5969.8470.2272.3772.8670.58
FSP72.910.0069.9570.1171.8972.6270.23
NST73.6872.2469.6069.5371.9673.3071.53
CRD75.4874.1471.1671.4673.4875.5173.94
CRD+KD75.6474.3871.6371.5673.7575.4674.29
ImgNet3274.9174.8071.3871.4873.1775.5773.95
MixUp76.2075.5372.0072.2774.6076.7374.56
CutMix76.4075.8572.3372.6874.2476.8174.87
SuperMix76.9376.1172.6472.7574.8077.1675.38
ImgNet32+KD76.5275.7072.2272.2374.2476.4675.02
MixUp+KD76.5876.1072.8972.8274.9477.0775.58
CutMix+KD76.8176.4572.6772.8374.8776.9075.50
SuperMix+KD77.4576.5373.1972.9675.2177.5976.03

Questions

If there is a question regarding any part of the code, or it needs further clarification, please create an issue or send me an email: ad0046@mix.wvu.edu.

Citation

If you found SuperMix helpful for your research, please cite our paper:

@article{dabouei2020,
  title={SuperMix: Supervising the Mixing Data Augmentation},
  author={Dabouei, Ali and Soleymani, Sobhan and Taherkhani, Fariborz and Nasrabadi, Nasser M},
  journal={arXiv preprint arXiv:2003.05034},
  year={2020}
}

Moreover, if you are developing distillation methods, we encourage you to cite CRD, due to their notable contribution by benchmarking the state-of-the-art methods of distillation.

@inproceedings{tian2019crd,
  title={Contrastive Representation Distillation},
  author={Yonglong Tian and Dilip Krishnan and Phillip Isola},
  booktitle={International Conference on Learning Representations},
  year={2020}
}