Home

Awesome

Information Theoretic Representation Distillation

Code for the BMVC2022 paper:

"Information Theoretic Representation Distillation".
Roy Miles, Adrian Lopez Rodriguez, Krystian Mikolajczyk. BMVC 2022.

[Paper on arxiv]

Running

The provided code is for reproducing the CIFAR100 experimental results in our paper Information Theoretic Representation Distillation . The code is tested with Python 3.8.

  1. Fetch the pretrained teacher models by:
sh scripts/fetch_pretrained_teachers.sh

which will download and save the models to save/models

  1. Install dependancies:
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install tensorboard_logger
  1. An example of performing distillation with ITRD losses is given as follows:
python train_student.py --path_t ./save/models/wrn_40_2_vanilla/ckpt_epoch_240.pth --model_s wrn_16_2 -b 1.0 -r 1.0 --lambda_corr 2.0 --lambda_mutual 1.0 --alpha_it 1.01

where the flags are explained as:

Benchmark Results on CIFAR-100:

Performance is measured by classification accuracy (%)

  1. Teacher and student are of the same architectural type.
Teacher <br> Studentwrn-40-2 <br> wrn-16-2wrn-40-2 <br> wrn-40-1resnet56 <br> resnet20resnet110 <br> resnet20resnet110 <br> resnet32resnet32x4 <br> resnet8x4vgg13 <br> vgg8
Teacher <br> Student75.61 <br> 73.2675.61 <br> 71.9872.34 <br> 69.0674.31 <br> 69.0674.31 <br> 71.1479.42 <br> 72.5074.64 <br> 70.36
KD74.9273.5470.6670.6773.0873.3372.98
FitNet73.5872.2469.2168.9971.0673.5071.02
AT74.0872.7770.5570.2272.3173.4471.43
SP73.8372.4369.6770.0472.6972.9472.68
CC73.5672.2169.6369.4871.4872.9770.71
VID74.1173.3070.3870.1672.6173.0971.23
RKD73.3572.2269.6169.2571.8271.9071.48
PKT74.5473.4570.3470.2572.6173.6472.88
AB72.5072.3869.4769.5370.9873.1770.94
FT73.2571.5969.8470.2272.3772.8670.58
FSP72.91N/A69.9570.1171.8972.6270.23
NST73.6872.2469.6069.5371.9673.3071.53
CRD75.4874.1471.1671.4673.4875.5173.94
WCoRD76.1174.7271.9271.8874.2076.1574.72
ReviewKD76.1275.0971.89-73.8975.6374.85
ITRD76.1275.1871.4771.9974.2676.6974.93
  1. Teacher and student are of different architectural type.
Teacher <br> Studentvgg13 <br> MobileNetV2ResNet50 <br> MobileNetV2ResNet50 <br> vgg8resnet32x4 <br> ShuffleNetV1resnet32x4 <br> ShuffleNetV2wrn-40-2 <br> ShuffleNetV1
Teacher <br> Student74.64 <br> 64.6079.34 <br> 64.6079.34 <br> 70.3679.42 <br> 70.5079.42 <br> 71.8275.61 <br> 70.50
KD67.3767.3573.8174.0774.4574.83
FitNet64.1463.1670.6973.5973.5473.73
AT59.4058.5871.8471.7372.7373.32
SP66.3068.0873.3473.4874.5674.52
CC64.8665.4370.2571.1471.2971.38
VID65.5667.5770.3073.3873.4073.61
RKD64.5264.4371.5072.2873.2172.21
PKT67.1366.5273.0174.1074.6973.89
AB66.0667.2070.6573.5574.3173.34
FT61.7860.9970.2971.7572.5072.03
NST58.1664.9671.2874.1274.6874.89
CRD69.7369.1174.3075.1175.6576.05
WCoRD70.0270.1274.6875.7776.4876.68
ReviewKD70.3769.89-77.4577.7877.14
ITRD70.3971.3475.4976.6977.4077.09

Binary Distillation

Code can be found at: https://drive.google.com/file/d/1WJ_rGIsQ-SaqvXzNsfkxcNA6rWBbcnD8/view?usp=sharing Relevant comments: https://github.com/roymiles/ITRD/issues/1

Citation

@inproceedings{miles2022itrd,
  title={Information Theoretic Representation Distillation},
  author={Miles, Roy and Lopez-Rodriguez, Adrian and Mikolajczyk, Krystian},
  booktitle={British Machine Vision Conference (BMVC)},
  year={2022}
}

Acknowledgements

Our code is based on the code given in RepDistiller.