Home

Awesome

SemCKD

Cross-Layer Distillation with Semantic Calibration (AAAI-2021) https://arxiv.org/abs/2012.03236v1

Journal version was published in IEEE TKDE https://ieeexplore.ieee.org/document/9767633

A more compact and clear implementation (CVPR-2022) was provided in https://github.com/DefangChen/SimKD

Overview

The existing feature distillation works can be separated into two categories according to the position where the knowledge distillation is performed. As shown in the figure below, one is feature-map distillation and another one is feature-embedding distillation.

FD

SemCKD belongs to feature-map distillation and is compatible with SOTA feature-embedding distillation (e.g., CRD) to further boost the performance of Student Networks.

This repo contains the implementation of SemCKD together with the compared approaches, such as classic KD, Feature-Map Distillation variants like FitNet, AT, SP, VID, HKD and feature-embedding distillation variants like PKT, RKD, IRG, CC, CRD.

CIFAR-100 Results

result

where ARI means Average Relative Improvement. This evaluation metric reflects the extent to which SemCKD further improves on the basis of existing approaches compared to improvements made by these approaches upon the baseline student model.

To get the pretrained teacher models for CIFAR-100:

sh scripts/fetch_pretrained_teachers.sh

For ImageNet, pretrained models from torchvision are used, e.g. ResNet34. Save the model to ./save/models/$MODEL_vanilla/ and use scripts/model_transform.py to make it readable by our code.

Running SemCKD:

# CIFAR-100
python train_student.py --path-t ./save/models/resnet32x4_vanilla/ckpt_epoch_240.pth --distill semckd --model_s resnet8x4 -r 1 -a 1 -b 400 --trial 0
# ImageNet
python train_student.py --path-t ./save/models/ResNet34_vanilla/resnet34_transformed.pth \
--batch_size 256 --epochs 90 --dataset imagenet --gpu_id 0,1,2,3,4,5,6,7 --dist-url tcp://127.0.0.1:23333 \
--print-freq 100 --num_workers 32 --distill semckd --model_s ResNet18 -r 1 -a 1 -b 50 --trial 0 \
--multiprocessing-distributed --learning_rate 0.1 --lr_decay_epochs 30,60 --weight_decay 1e-4 --dali gpu

Note:

Citation

If you find this repository useful, please consider citing the following paper:

@inproceedings{chen2021cross,
  author    = {Defang Chen and Jian{-}Ping Mei and Yuan Zhang and Can Wang and Zhe Wang and Yan Feng and Chun Chen},
  title     = {Cross-Layer Distillation with Semantic Calibration},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  pages     = {7028--7036},
  year      = {2021},
}

@article{wang2022semckd,
  title={SemCKD: Semantic calibration for cross-layer knowledge distillation},
  author={Wang, Can and Chen, Defang and Mei, Jian-Ping and Zhang, Yuan and Feng, Yan and Chen, Chun},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  volume={35},
  number={6},
  pages={6305--6319},
  year={2022},
  publisher={IEEE}
}