Home

Awesome

Knowledge-Distillation-Zoo

News! I'm going to have a major update of this repo. The new version will contain most of the methods in Todo list. Please stay tuned.

Pytorch implementation of various Knowledge Distillation (KD) methods.

This repository is a simple reference, mainly focuses on basic knowledge distillation/transfer methods. Thus many tricks and variations, such as step-by-step training, iterative training, ensemble of teachers, ensemble of KD methods, data-free, self-distillation, online distillation etc. are not considered. Hope it is useful for your project or research.

I will update this repo regularly with new KD methods. If there some basic methods I missed, please contact with me.

Lists

NameMethodPaper LinkCode Link
Baselinebasic model with softmax losscode
Logitsmimic learning via regressing logitspapercode
STsoft targetpapercode
ATattention transferpapercode
Fitnethints for thin deep netspapercode
NSTneural selective transferpapercode
PKTprobabilistic knowledge transferpapercode
FSPflow of solution procedurepapercode
FTfactor transferpapercode
RKDrelational knowledge distillationpapercode
ABactivation boundarypapercode
SPsimilarity preservationpapercode
Sobolevsobolev/jacobian matchingpapercode
BSSboundary supporting samplespapercode
CCcorrelation congruencepapercode
LwMlearning without memorizingpapercode
IRGinstance relationship graphpapercode
VIDvariational information distillationpapercode
OFDoverhaul of feature distillationpapercode
AFDattention feature distillationpapercode
CRDcontrastive representation distillationpapercode
DMLdeep mutual learningpapercode

Datasets

Networks

The networks are same with Tabel 6 in paper.

Training

Results

<table> <tr> <td>Teacher</td> <td>Student</td> <td>Name</td> <td>CIFAR10</td> <td>CIFAR100</td> </tr> <tr> <td>-</td> <td>resnet-20</td> <td>Baseline</td> <td>92.37%</td> <td>68.92%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>Logits</td> <td>93.30%</td> <td>70.36%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>ST</td> <td>93.12%</td> <td>70.27%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>AT</td> <td>92.89%</td> <td>69.70%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>Fitnet</td> <td>92.73%</td> <td>70.08%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>NST</td> <td>92.79%</td> <td>69.21%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>PKT</td> <td>92.50%</td> <td>69.25%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>FSP</td> <td>92.76%</td> <td>69.61%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>FT</td> <td>92.98%</td> <td>69.90%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>RKD</td> <td>92.72%</td> <td>69.48%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>AB</td> <td>93.04%</td> <td>69.96%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>SP</td> <td>92.88%</td> <td>69.85%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>Sobolev</td> <td>92.78%</td> <td>69.39%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>BSS</td> <td>92.58%</td> <td>69.96%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>CC</td> <td>93.01%</td> <td>69.27%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>LwM</td> <td>92.80%</td> <td>69.23%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>IRG</td> <td>92.77%</td> <td>70.37%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>VID</td> <td>92.61%</td> <td>69.39%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>OFD</td> <td>92.82%</td> <td>69.93%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>AFD</td> <td>92.56%</td> <td>69.63%</td> </tr> <tr> <td>resnet-20</td> <td>resnet-20</td> <td>CRD</td> <td>92.96%</td> <td>70.33%</td> </tr> </table> <table> <tr> <td>Teacher</td> <td>Student</td> <td>Name</td> <td>CIFAR10</td> <td>CIFAR100</td> </tr> <tr> <td>-</td> <td>resnet-20</td> <td>Baseline</td> <td>92.37%</td> <td>68.92%</td> </tr> <tr> <td>-</td> <td>resnet-110</td> <td>Baseline</td> <td>93.86%</td> <td>73.15%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>Logits</td> <td>92.98%</td> <td>69.78%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>ST</td> <td>92.82%</td> <td>70.06%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>AT</td> <td>93.21%</td> <td>69.28%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>Fitnet</td> <td>93.04%</td> <td>69.81%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>NST</td> <td>92.83%</td> <td>69.31%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>PKT</td> <td>93.01%</td> <td>69.31%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>FSP</td> <td>92.78%</td> <td>69.78%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>FT</td> <td>93.01%</td> <td>69.49%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>RKD</td> <td>93.21%</td> <td>69.36%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>AB</td> <td>92.96%</td> <td>69.41%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>SP</td> <td>93.30%</td> <td>69.45%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>Sobolev</td> <td>92.60%</td> <td>69.23%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>BSS</td> <td>92.78%</td> <td>69.71%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>CC</td> <td>92.98%</td> <td>69.33%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>LwM</td> <td>92.52%</td> <td>69.11%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>IRG</td> <td>93.13%</td> <td>69.36%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>VID</td> <td>92.98%</td> <td>69.49%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>OFD</td> <td>93.13%</td> <td>69.81%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>AFD</td> <td>92.92%</td> <td>69.60%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-20</td> <td>CRD</td> <td>92.92%</td> <td>70.80%</td> </tr> </table> <table> <tr> <td>Teacher</td> <td>Student</td> <td>Name</td> <td>CIFAR10</td> <td>CIFAR100</td> </tr> <tr> <td>-</td> <td>resnet-110</td> <td>Baseline</td> <td>93.86%</td> <td>73.15%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>Logits</td> <td>94.38%</td> <td>74.89%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>ST</td> <td>94.59%</td> <td>74.33%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>AT</td> <td>94.42%</td> <td>74.64%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>Fitnet</td> <td>94.43%</td> <td>73.63%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>NST</td> <td>94.43%</td> <td>73.55%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>PKT</td> <td>94.35%</td> <td>73.74%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>FSP</td> <td>94.39%</td> <td>73.59%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>FT</td> <td>94.30%</td> <td>74.72%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>RKD</td> <td>94.39%</td> <td>73.78%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>AB</td> <td>94.63%</td> <td>73.91%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>SP</td> <td>94.45%</td> <td>74.07%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>Sobolev</td> <td>94.26%</td> <td>73.14%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>BSS</td> <td>94.19%</td> <td>73.87%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>CC</td> <td>94.49%</td> <td>74.43%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>LwM</td> <td>94.19%</td> <td>73.28%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>IRG</td> <td>94.44%</td> <td>74.96%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>VID</td> <td>94.25%</td> <td>73.63%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>OFD</td> <td>94.38%</td> <td>74.11%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>AFD</td> <td>94.44%</td> <td>73.90%</td> </tr> <tr> <td>resnet-110</td> <td>resnet-110</td> <td>CRD</td> <td>94.30%</td> <td>75.44%</td> </tr> </table> <table> <tr> <td>Net1</td> <td>Net2</td> <td>Name</td> <td>CIFAR10</td> <td>CIFAR100</td> </tr> <tr> <td>-</td> <td>resnet-20</td> <td>baseline</td> <td>92.37%</td> <td>68.92%</td> </tr> <tr> <td>-</td> <td>resnet-110</td> <td>baseline</td> <td>93.86%</td> <td>73.15%</td> </tr> <tr> <td>resnet20</td> <td>resnet20</td> <td>DML</td> <td>93.07%/93.37%</td> <td>70.39%/70.22%</td> </tr> <tr> <td>resnet110</td> <td>resnet20</td> <td>DML</td> <td>94.45%/92.92%</td> <td>74.53%/70.29%</td> </tr> <tr> <td>resnet110</td> <td>resnet110</td> <td>DML</td> <td>94.74%/94.79%</td> <td>74.72%/75.55%</td> </tr> </table>

Todo List

Requirements

Acknowledgements

This repo is partly based on the following repos, thank the authors a lot.

If you employ the listed KD methods in your research, please cite the corresponding papers.