Awesome
Knowledge Distillation via the Target-aware Transformer (CVPR2022)
Experiments on semantic segmentation of our work. See this link for experiments on ImageNet.
Requirement
- python 3.8
- pytorch >= 1.9.0
- torchvision 0.11.1
- einops
Note
All the experiments are conducted on a single Nvidia A100 (40GB). Multi-gpu environment hasn't been tested.
Overview
Before getting started
Please modify the dataset path on the file mypath.py according to your system.
Implementation
Our model is implemented on ./distiller_tat.
We also provide the implementation of ReveiwKD on ./distiller_reveiwkd and other methods (KD/FitNet/AT/ICKD) on ./distiller_comp.
Execution
The executable file is ./train_with_distillation_tat.
Training a teacher model
ResNet101 is used as teacher backbone.
Pascal VOC
We use the official model. Please download the checkpoint from here and put it on ./pretrained/ .
COCOStuff-10k
We train the teacher on our own. You may download the checkpoint from here or just simply running:
sh ./train_cocostuff10k_baseline.sh
Training with distillation
Please refer to the shell scripts. For instance, distilling the ResNet101 to ResNet18 on Pascal VOC:
sh ./train_voc_resnet18.sh
TO-DO
- Upload pre-trained cocostuff-10k teacher model
- Upload training log
- Dataset preparation