Home

Awesome

Switchable Online Knowledge Distillation [ECCV 2022]

This repository is the official code for the paper "Switchable Online Knowledge Distillation" by Biao Qian, Yang Wang (corresponding author: yangwang@hfut.edu.cn), Hongzhi Yin, Richang Hong, Meng Wang (ECCV 2022, Tel-Aviv, Israel).

Introduction

To break down the bottlenecks over the gap between teacher and student --- e.g., Why and when does a large gap harm the performance, especially for student? How to quantify the gap between teacher and student? we deeply analyze the adversarial impact of large gap on student, and propose Switchable Online Knowledge Distillation (SwitOKD); see Figure 1. Instead of focusing on the accuracy gap at test phase by the existing arts, the core idea of SwitOKD is to adaptively calibrate the gap at training phase, namely distillation gap (quantified by ), via a switching strategy between two training modes — expert mode (pause the teacher while keep the student learning) and learning mode (restart the teacher and reciprocally train from scratch). To endow SwitOKD with the capacity to yield an appropriate distillation gap, we further devise an adaptive switching threshold (i.e., ), which provides a formal criterion as to when to switch to learning mode or expert mode, and thus improves the student’s performance. Meanwhile, the teacher keeps basically on a par with other online arts.

overview                                   Figure 1 Illustration of the proposed SwitOKD framework.

Based on the above, to endow SwitOKD with the extendibility to multi-network setting with large distillation gap, we build two types of fundamental basis topologies below: multiple teachers vs one student and one teacher vs multiple students. In the implementation, we take 3 networks as an example and denote the basis topologies as 2T1S and 1T2S, respectively; see Figure 2.

multi-net Figure 2 The multi-network framework for training 3 networks simultaneously, including two fundamental basis topologies: 2T1S (left) and 1T2S (right).

Requirements

Usages

To train the student and teacher model described in the paper, run the following command:

python ./SwitOKD_code/Tiny-ImageNet/Train/SwitOKD/main.py

To evaluate the trained student or teacher model, run the following command:

python ./SwitOKD_code/Tiny-ImageNet/Test/SwitOKD/main.py

Results

The performance of our models is measured by Top-1 classification accuracy (%), which is reported below:

results_table4

results_table7

Visual results of our models are obtained by Grad-cam visualization as follows:

The visual analysis confirms that our adaptive switching strategy enables student to mimic teacher well, and thus improves the classifcation accuracy.

visual_results                                   Figure 3 Visual analysis of why SwitOKD works.

Citation

@inproceedings{qian2022switchable,
  title={Switchable online knowledge distillation},
  author={Qian, Biao and Wang, Yang and Yin, Hongzhi and Hong, Richang and Wang, Meng},
  booktitle={European Conference on Computer Vision},
  pages={449--466},
  year={2022},
  organization={Springer}
}