Awesome
Awesome Knowledge-Distillation
- Awesome Knowledge-Distillation
Different forms of knowledge
Knowledge from logits
- Distilling the knowledge in a neural network. Hinton et al. arXiv:1503.02531
- Learning from Noisy Labels with Distillation. Li, Yuncheng et al. ICCV 2017
- Training Deep Neural Networks in Generations:A More Tolerant Teacher Educates Better Students. arXiv:1805.05551
- Learning Metrics from Teachers: Compact Networks for Image Embedding. Yu, Lu et al. CVPR 2019
- Relational Knowledge Distillation. Park, Wonpyo et al. CVPR 2019
- On Knowledge Distillation from Complex Networks for Response Prediction. Arora, Siddhartha et al. NAACL 2019
- On the Efficacy of Knowledge Distillation. Cho, Jang Hyun & Hariharan, Bharath. arXiv:1910.01348. ICCV 2019
- Revisit Knowledge Distillation: a Teacher-free Framework (Revisiting Knowledge Distillation via Label Smoothing Regularization). Yuan, Li et al. CVPR 2020 [code]
- Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher. Mirzadeh et al. arXiv:1902.03393
- Ensemble Distribution Distillation. ICLR 2020
- Noisy Collaboration in Knowledge Distillation. ICLR 2020
- On Compressing U-net Using Knowledge Distillation. arXiv:1812.00249
- Self-training with Noisy Student improves ImageNet classification. Xie, Qizhe et al.(Google) CVPR 2020
- Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework. AAAI 2020
- Preparing Lessons: Improve Knowledge Distillation with Better Supervision. arXiv:1911.07471
- Adaptive Regularization of Labels. arXiv:1908.05474
- Positive-Unlabeled Compression on the Cloud. Xu, Yixing et al. (HUAWEI) NeurIPS 2019
- Snapshot Distillation: Teacher-Student Optimization in One Generation. Yang, Chenglin et al. CVPR 2019
- QUEST: Quantized embedding space for transferring knowledge. Jain, Himalaya et al. arXiv:2020
- Conditional teacher-student learning. Z. Meng et al. ICASSP 2019
- Subclass Distillation. Müller, Rafael et al. arXiv:2002.03936
- MarginDistillation: distillation for margin-based softmax. Svitov, David & Alyamkin, Sergey. arXiv:2003.02586
- An Embarrassingly Simple Approach for Knowledge Distillation. Gao, Mengya et al. MLR 2018
- Sequence-Level Knowledge Distillation. Kim, Yoon & Rush, Alexander M. arXiv:1606.07947
- Boosting Self-Supervised Learning via Knowledge Transfer. Noroozi, Mehdi et al. CVPR 2018
- Meta Pseudo Labels. Pham, Hieu et al. ICML 2020 [code]
- Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model. CVPR 2020 [code]
- Distilled Binary Neural Network for Monaural Speech Separation. Chen Xiuyi et al. IJCNN 2018
- Teacher-Class Network: A Neural Network Compression Mechanism. Malik et al. arXiv:2004.03281
- Deeply-supervised knowledge synergy. Sun, Dawei et al. CVPR 2019
- What it Thinks is Important is Important: Robustness Transfers through Input Gradients. Chan, Alvin et al. CVPR 2020
- Triplet Loss for Knowledge Distillation. Oki, Hideki et al. IJCNN 2020
- Role-Wise Data Augmentation for Knowledge Distillation. ICLR 2020 [code]
- Distilling Spikes: Knowledge Distillation in Spiking Neural Networks. arXiv:2005.00288
- Improved Noisy Student Training for Automatic Speech Recognition. Park et al. arXiv:2005.09629
- Learning from a Lightweight Teacher for Efficient Knowledge Distillation. Yuang Liu et al. arXiv:2005.09163
- ResKD: Residual-Guided Knowledge Distillation. Li, Xuewei et al. arXiv:2006.04719
- Distilling Effective Supervision from Severe Label Noise. Zhang, Zizhao. et al. CVPR 2020 [code]
- Knowledge Distillation Meets Self-Supervision. Xu, Guodong et al. ECCV 2020 [code]
- Self-supervised Knowledge Distillation for Few-shot Learning. arXiv:2006.09785 [code]
- Learning with Noisy Class Labels for Instance Segmentation. ECCV 2020
- Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation. Wang, Liwei et al. arXiv:2007.01951
- Deep Streaming Label Learning. Wang, Zhen et al. ICML 2020 [code]
- Teaching with Limited Information on the Learner's Behaviour. Zhang, Yonggang et al. ICML 2020
- Discriminability Distillation in Group Representation Learning. Zhang, Manyuan et al. ECCV 2020
- Local Correlation Consistency for Knowledge Distillation. ECCV 2020
- Prime-Aware Adaptive Distillation. Zhang, Youcai et al. ECCV 2020
- One Size Doesn't Fit All: Adaptive Label Smoothing. Krothapalli et al. arXiv:2009.06432
- Learning to learn from noisy labeled data. Li, Junnan et al. CVPR 2019
- Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization. Wei, Hongxin et al. CVPR 2020
- Online Knowledge Distillation via Multi-branch Diversity Enhancement. Li, Zheng et al. ACCV 2020
- Pea-KD: Parameter-efficient and Accurate Knowledge Distillation. arXiv:2009.14822
- Extending Label Smoothing Regularization with Self-Knowledge Distillation. Wang, Jiyue et al. arXiv:2009.05226
- Spherical Knowledge Distillation. Guo, Jia et al. arXiv:2010.07485
- Soft-Label Dataset Distillation and Text Dataset Distillation. arXiv:1910.02551
- Wasserstein Contrastive Representation Distillation. Chen, Liqun et al. cvpr 2021
- Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup. Xu, Guodong et al. cvpr 2021 [code]
- Knowledge Refinery: Learning from Decoupled Label. Ding, Qianggang et al. AAAI 2021
- Rocket Launching: A Universal and Efficient Framework for Training Well-performing Light Net. Zhou, Guorui et al. AAAI 2018
- Distilling Virtual Examples for Long-tailed Recognition. He, Yin-Yin et al. CVPR 2021
- Balanced Knowledge Distillation for Long-tailed Learning. Zhang, Shaoyu et al. arXiv:2014.10510
- Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation. Kim, Taehyeon et al. IJCAI 2021 [code]
- Not All Knowledge Is Created Equal. Li, Ziyun et al. arXiv:2106.01489
- Knowledge distillation: A good teacher is patient and consistent. Beyer et al. arXiv:2106.05237v1
- Hierarchical Self-supervised Augmented Knowledge Distillation. Yang et al. IJCAI 2021 [code]
Knowledge from intermediate layers
- Fitnets: Hints for thin deep nets. Romero, Adriana et al. arXiv:1412.6550
- Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Zagoruyko et al. ICLR 2017
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks. Zhang, Zhi et al. arXiv:1710.09505
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. Yim, Junho et al. CVPR 2017
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. Huang, Zehao & Wang, Naiyan. 2017
- Paraphrasing complex network: Network compression via factor transfer. Kim, Jangho et al. NeurIPS 2018
- Knowledge transfer with jacobian matching. ICML 2018
- Self-supervised knowledge distillation using singular value decomposition. Lee, Seung Hyun et al. ECCV 2018
- Learning Deep Representations with Probabilistic Knowledge Transfer. Passalis et al. ECCV 2018
- Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019
- Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
- Knowledge Distillation via Route Constrained Optimization. Jin, Xiao et al. ICCV 2019
- Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019
- MEAL: Multi-Model Ensemble via Adversarial Learning. Shen,Zhiqiang, He,Zhankui, and Xue Xiangyang. AAAI 2019
- A Comprehensive Overhaul of Feature Distillation. Heo, Byeongho et al. ICCV 2019 [code]
- Feature-map-level Online Adversarial Knowledge Distillation. ICML 2020
- Distilling Object Detectors with Fine-grained Feature Imitation. ICLR 2020
- Knowledge Squeezed Adversarial Network Compression. Changyong, Shu et al. AAAI 2020
- Stagewise Knowledge Distillation. Kulkarni, Akshay et al. arXiv: 1911.06786
- Knowledge Distillation from Internal Representations. AAAI 2020
- Knowledge Flow:Improve Upon Your Teachers. ICLR 2019
- LIT: Learned Intermediate Representation Training for Model Compression. ICML 2019
- Improving the Adversarial Robustness of Transfer Learning via Noisy Feature Distillation. Chin, Ting-wu et al. arXiv:2002.02998
- Knapsack Pruning with Inner Distillation. Aflalo, Yonathan et al. arXiv:2002.08258
- Residual Knowledge Distillation. Gao, Mengya et al. arXiv:2002.09168
- Knowledge distillation via adaptive instance normalization. Yang, Jing et al. arXiv:2003.04289
- Bert-of-Theseus: Compressing bert by progressive module replacing. Xu, Canwen et al. arXiv:2002.02925 [code]
- Distilling Spikes: Knowledge Distillation in Spiking Neural Networks. arXiv:2005.00727
- Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks. Meet et al. arXiv:2005.08110
- Feature-map-level Online Adversarial Knowledge Distillation. Chung, Inseop et al. ICML 2020
- Channel Distillation: Channel-Wise Attention for Knowledge Distillation. Zhou, Zaida et al. arXiv:2006.01683 [code]
- Matching Guided Distillation. ECCV 2020 [code]
- Differentiable Feature Aggregation Search for Knowledge Distillation. ECCV 2020
- Interactive Knowledge Distillation. Fu, Shipeng et al. arXiv:2007.01476
- Feature Normalized Knowledge Distillation for Image Classification. ECCV 2020 [code]
- Layer-Level Knowledge Distillation for Deep Neural Networks. Li, Hao Ting et al. Applied Sciences, 2019
- Knowledge Distillation with Feature Maps for Image Classification. Chen, Weichun et al. ACCV 2018
- Efficient Kernel Transfer in Knowledge Distillation. Qian, Qi et al. arXiv:2009.14416
- Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition. arXiv:2009.06902
- Kernel Based Progressive Distillation for Adder Neural Networks. Xu, Yixing et al. NeurIPS 2020
- Feature Distillation With Guided Adversarial Contrastive Learning. Bai, Tao et al. arXiv:2009.09922
- Pay Attention to Features, Transfer Learn Faster CNNs. Wang, Kafeng et al. ICLR 2019
- Multi-level Knowledge Distillation. Ding, Fei et al. arXiv:2012.00573
- Cross-Layer Distillation with Semantic Calibration. Chen, Defang et al. AAAI 2021 [code]
- Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures. Wang, Xinglu & Li, Yingming. AAAI 2021
- Robust Knowledge Transfer via Hybrid Forward on the Teacher-Student Model. Song, Liangchen et al. AAAI 2021
- Show, Attend and Distill: Knowledge Distillation via Attention-Based Feature Matching. Ji, Mingi et al. AAAI 2021 [code]
- MINILMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers. Wang, Wenhui et al. arXiv:2012.15828
- ALP-KD: Attention-Based Layer Projection for Knowledge Distillation. Peyman et al. AAAI 2021
- In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation. Reyhan et al. arXiv:2103.00053
- Fixing the Teacher-Student Knowledge Discrepancy in Distillation. Han, Jiangfan et al. arXiv:2103.16844
- Student Network Learning via Evolutionary Knowledge Distillation. Zhang, Kangkai et al. arXiv:2103.13811
- Distilling Knowledge via Knowledge Review. Chen, Pengguang et al. CVPR 2021
- Knowledge Distillation By Sparse Representation Matching. Tran et al. arXiv:2103.17012
- Task-Oriented Feature Distillation. Zhang et al. NeurIPS 2020 [code]
- Adversarial Knowledge Transfer from Unlabeled Data. Gupta et al. ACM-MM 2020 code
- Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability. He et al. CVPR 2020
- PDF-Distil: Including Prediction Disagreements in Feature-based Knowledge Distillation for Object Detection. Zhang et al. BMVC 2021 code
Graph-based
- Graph-based Knowledge Distillation by Multi-head Attention Network. Lee, Seunghyun and Song, Byung. Cheol arXiv:1907.02226
- Graph Representation Learning via Multi-task Knowledge Distillation. arXiv:1911.05700
- Deep geometric knowledge distillation with graphs. arXiv:1911.03080
- Better and faster: Knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification. IJCAI 2018
- Distillating Knowledge from Graph Convolutional Networks. Yang, Yiding et al. CVPR 2020 [code]
- Saliency Prediction with External Knowledge. Zhang, Yifeng et al. arXiv:2007.13839
- Multi-label Zero-shot Classification by Learning to Transfer from External Knowledge. Huang, He et al. arXiv:2007.15610
- Reliable Data Distillation on Graph Convolutional Network. Zhang, Wentao et al. ACM SIGMOD 2020
- Mutual Teaching for Graph Convolutional Networks. Zhan, Kun et al. Future Generation Computer Systems, 2021
- DistilE: Distiling Knowledge Graph Embeddings for Faster and Cheaper Reasoning. Zhu, Yushan et al. arXiv:2009.05912
- Distill2Vec: Dynamic Graph Representation Learning with Knowledge Distillation. Antaris, Stefanos & Rafailidis, Dimitrios. arXiv:2011.05664
- On Self-Distilling Graph Neural Network. Chen, Yuzhao et al. arXiv:2011.02255
- Iterative Graph Self Distillation. iclr 2021
- Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework. Yang, Cheng et al. WWW 2021 [code]
- Graph Distillation for Action Detection with Privileged Information in RGB-D Videos. Luo, Zelun et al. ECCV 2018
- Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification. Liu, Xiaobin & Zhang, Shiliang. IJCAI 2021
Mutual Information & Online Learning
- Correlation Congruence for Knowledge Distillation. Peng, Baoyun et al. ICCV 2019
- Similarity-Preserving Knowledge Distillation. Tung, Frederick, and Mori Greg. ICCV 2019
- Variational Information Distillation for Knowledge Transfer. Ahn, Sungsoo et al. CVPR 2019
- Contrastive Representation Distillation. Tian, Yonglong et al. ICLR 2020 [RepDistill]
- Online Knowledge Distillation via Collaborative Learning. Guo, Qiushan et al. CVPR 2020
- Peer Collaborative Learning for Online Knowledge Distillation. Wu, Guile & Gong, Shaogang. AAAI 2021
- Knowledge Transfer via Dense Cross-layer Mutual-distillation. ECCV 2020
- MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution. Yang, Taojiannan et al. ECCV 2020 [code]
- AMLN: Adversarial-based Mutual Learning Network for Online Knowledge Distillation. ECCV 2020
- Towards Cross-modality Medical Image Segmentation with Online Mutual Knowledge. Li, Kang et al. AAAI 2021
- Federated Knowledge Distillation. Seo, Hyowoon et al. arXiv:2011.02367
- Unsupervised Image Segmentation using Mutual Mean-Teaching. Wu, Zhichao et al.arXiv:2012.08922
- Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning. Cai, Zhaowei et al. arXiv:2101.08482
- Robust Mutual Learning for Semi-supervised Semantic Segmentation. Zhang, Pan et al. arXiv:2106.00609
- Mutual Contrastive Learning for Visual Representation Learning. Yang et al. AAAI 2022 [code]
- Information Theoretic Representation Distillation. Miles et al. BMVC 2022 [code]
Self-KD
- Moonshine: Distilling with Cheap Convolutions. Crowley, Elliot J. et al. NeurIPS 2018
- Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. Zhang, Linfeng et al. ICCV 2019
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019
- BAM! Born-Again Multi-Task Networks for Natural Language Understanding. Clark, Kevin et al. ACL 2019,short
- Self-Knowledge Distillation in Natural Language Processing. Hahn, Sangchul and Choi, Heeyoul. arXiv:1908.01851
- Rethinking Data Augmentation: Self-Supervision and Self-Distillation. Lee, Hankook et al. ICLR 2020
- MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks. arXiv:1911.09418
- Self-Distillation Amplifies Regularization in Hilbert Space. Mobahi, Hossein et al. NeurIPS 2020
- MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. Wang, Wenhui et al. arXiv:2002.10957
- Regularizing Class-wise Predictions via Self-knowledge Distillation. CVPR 2020 [code]
- Self-Distillation as Instance-Specific Label Smoothing. Zhang, Zhilu & Sabuncu, Mert R. NeurIPS 2020
- Self-PU: Self Boosted and Calibrated Positive-Unlabeled Training. Chen, Xuxi et al. ICML 2020 [code]
- S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning. Karsten et al. ICML 2021
- Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection. Huang, Zeyi et al. NeurIPS 2020
- Distillation-Based Training for Multi-Exit Architectures. Phuong, Mary and Lampert, Christoph H. ICCV 2019
- Pair-based self-distillation for semi-supervised domain adaptation. iclr 2021
- SEED: SElf-SupErvised Distillation. ICLR 2021
- Self-Feature Regularization: Self-Feature Distillation Without Teacher Models. Fan, Wenxuan & Hou, Zhenyan.arXiv:2103.07350
- Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation. Ji, Mingi et al. CVPR 2021 [code]
- SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud. Zheng, Wu et al. CVPR 2021 [code]
- Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification. Ge, Yixiao et al. CVPR 2021
- Towards Compact Single Image Super-Resolution via Contrastive Self-distillation. IJCAI 2021
- DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers paper
- Knowledge Distillation with the Reused Teacher Classifier paper
- Self-Distillation from the Last Mini-Batch for Consistency Regularizatio paper
- Decoupled Knowledge Distillation paper
Structural Knowledge
- Paraphrasing Complex Network:Network Compression via Factor Transfer. Kim, Jangho et al. NeurIPS 2018
- Relational Knowledge Distillation. Park, Wonpyo et al. CVPR 2019
- Knowledge Distillation via Instance Relationship Graph. Liu, Yufan et al. CVPR 2019
- Contrastive Representation Distillation. Tian, Yonglong et al. ICLR 2020
- Teaching To Teach By Structured Dark Knowledge. ICLR 2020
- Inter-Region Affinity Distillation for Road Marking Segmentation. Hou, Yuenan et al. CVPR 2020 [code]
- Heterogeneous Knowledge Distillation using Information Flow Modeling. Passalis et al. CVPR 2020 [code]
- Asymmetric metric learning for knowledge transfer. Budnik, Mateusz & Avrithis, Yannis. arXiv:2006.16331
- Local Correlation Consistency for Knowledge Distillation. ECCV 2020
- Few-Shot Class-Incremental Learning. Tao, Xiaoyu et al. CVPR 2020
- Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation. ECCV 2020
- Interpretable Foreground Object Search As Knowledge Distillation. ECCV 2020
- Improving Knowledge Distillation via Category Structure. ECCV 2020
- Few-Shot Class-Incremental Learning via Relation Knowledge Distillation. Dong, Songlin et al. AAAI 2021
- Complementary Relation Contrastive Distillation. Zhu, Jinguo et al. CVPR 2021
- Information Theoretic Representation Distillation. Miles et al. BMVC 2022 [code]
Privileged Information
- Learning using privileged information: similarity control and knowledge transfer. Vapnik, Vladimir and Rauf, Izmailov. MLR 2015
- Unifying distillation and privileged information. Lopez-Paz, David et al. ICLR 2016
- Model compression via distillation and quantization. Polino, Antonio et al. ICLR 2018
- KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NeurIPS 2018
- Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019
- Retaining privileged information for multi-task learning. Tang, Fengyi et al. KDD 2019
- A Generalized Meta-loss function for regression and classification using privileged information. Asif, Amina et al. arXiv:1811.06885
- Private Knowledge Transfer via Model Distillation with Generative Adversarial Networks. Gao, Di & Zhuo, Cheng. AAAI 2020
- Privileged Knowledge Distillation for Online Action Detection. Zhao, Peisen et al. cvpr 2021
- Adversarial Distillation for Learning with Privileged Provisions. Wang, Xiaojie et al. TPAMI 2019
KD + GAN
- Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks. Xu, Zheng et al. arXiv:1709.00513
- KTAN: Knowledge Transfer Adversarial Network. Liu, Peiye et al. arXiv:1810.08126
- KDGAN:Knowledge Distillation with Generative Adversarial Networks. Wang, Xiaojie. NeurIPS 2018
- Adversarial Learning of Portable Student Networks. Wang, Yunhe et al. AAAI 2018
- Adversarial Network Compression. Belagiannis et al. ECCV 2018
- Cross-Modality Distillation: A case for Conditional Generative Adversarial Networks. ICASSP 2018
- Adversarial Distillation for Efficient Recommendation with External Knowledge. TOIS 2018
- Training student networks for acceleration with conditional adversarial networks. Xu, Zheng et al. BMVC 2018
- DAFL:Data-Free Learning of Student Networks. Chen, Hanting et al. ICCV 2019
- MEAL: Multi-Model Ensemble via Adversarial Learning. Shen, Zhiqiang et al. AAAI 2019
- Knowledge Distillation with Adversarial Samples Supporting Decision Boundary. Heo, Byeongho et al. AAAI 2019
- Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection. Liu, Jian et al. AAAI 2019
- Adversarially Robust Distillation. Goldblum, Micah et al. AAAI 2020
- GAN-Knowledge Distillation for one-stage Object Detection. Hong, Wei et al. arXiv:1906.08467
- Lifelong GAN: Continual Learning for Conditional Image Generation. Kundu et al. arXiv:1908.03884
- Compressing GANs using Knowledge Distillation. Aguinaldo, Angeline et al. arXiv:1902.00159
- Feature-map-level Online Adversarial Knowledge Distillation. ICML 2020
- MineGAN: effective knowledge transfer from GANs to target domains with few images. Wang, Yaxing et al. CVPR 2020
- Distilling portable Generative Adversarial Networks for Image Translation. Chen, Hanting et al. AAAI 2020
- GAN Compression: Efficient Architectures for Interactive Conditional GANs. Junyan Zhu et al. CVPR 2020 [code]
- Adversarial network compression. Belagiannis et al. ECCV 2018
- P-KDGAN: Progressive Knowledge Distillation with GANs for One-class Novelty Detection. Zhang, Zhiwei et al. IJCAI 2020
- StyleGAN2 Distillation for Feed-forward Image Manipulation. Viazovetskyi et al. ECCV 2020 [code]
- HardGAN: A Haze-Aware Representation Distillation GAN for Single Image Dehazing. ECCV 2020
- TinyGAN: Distilling BigGAN for Conditional Image Generation. ACCV 2020 [code]
- Learning Efficient GANs via Differentiable Masks and co-Attention Distillation. Li, Shaojie et al. arXiv:2011.08382 [code]
- Self-Supervised GAN Compression. Yu, Chong & Pool, Jeff. arXiv:2007.01491
- Teachers Do More Than Teach: Compressing Image-to-Image Models. CVPR 2021 [code]
- Distilling and Transferring Knowledge via cGAN-generated Samples for Image Classification and Regression. Ding, Xin et al. arXiv:2104.03164
- Content-Aware GAN Compression. Liu, Yuchen et al. CVPR 2021
KD + Meta-learning
- Few Sample Knowledge Distillation for Efficient Network Compression. Li, Tianhong et al. CVPR 2020
- Learning What and Where to Transfer. Jang, Yunhun et al, ICML 2019
- Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019
- Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019
- Diversity with Cooperation: Ensemble Methods for Few-Shot Classification. Dvornik, Nikita et al. ICCV 2019
- Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation. arXiv:1911.05329v1
- Progressive Knowledge Distillation For Generative Modeling. ICLR 2020
- Few Shot Network Compression via Cross Distillation. AAAI 2020
- MetaDistiller: Network Self-boosting via Meta-learned Top-down Distillation. Liu, Benlin et al. ECCV 2020
- Few-Shot Learning with Intra-Class Knowledge Transfer. arXiv:2008.09892
- Few-Shot Object Detection via Knowledge Transfer. Kim, Geonuk et al. arXiv:2008.12496
- Distilled One-Shot Federated Learning. arXiv:2009.07999
- Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains. Pan, Haojie et al. arXiv:2012.01266
- Progressive Network Grafting for Few-Shot Knowledge Distillation. Shen, Chengchao et al. AAAI 2021
Data-free KD
- Data-Free Knowledge Distillation for Deep Neural Networks. NeurIPS 2017
- Zero-Shot Knowledge Distillation in Deep Networks. ICML 2019
- DAFL:Data-Free Learning of Student Networks. ICCV 2019
- Zero-shot Knowledge Transfer via Adversarial Belief Matching. Micaelli, Paul and Storkey, Amos. NeurIPS 2019
- Dream Distillation: A Data-Independent Model Compression Framework. Kartikeya et al. ICML 2019
- Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion. Yin, Hongxu et al. CVPR 2020 [code]
- Data-Free Adversarial Distillation. Fang, Gongfan et al. CVPR 2020
- The Knowledge Within: Methods for Data-Free Model Compression. Haroush, Matan et al. CVPR 2020
- Knowledge Extraction with No Observable Data. Yoo, Jaemin et al. NeurIPS 2019 [code]
- Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN. CVPR 2020
- DeGAN: Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier. Addepalli, Sravanti et al. arXiv:1912.11960
- Generative Low-bitwidth Data Free Quantization. Xu, Shoukai et al. ECCV 2020 [code]
- This dataset does not exist: training models from generated images. arXiv:1911.02888
- MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation. Sanjay et al. arXiv:2005.03161
- Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data. Such et al. ECCV 2020
- Billion-scale semi-supervised learning for image classification. FAIR. arXiv:1905.00546 [code]
- Data-Free Network Quantization with Adversarial Knowledge Distillation. Choi, Yoojin et al. CVPRW 2020
- Adversarial Self-Supervised Data-Free Distillation for Text Classification. EMNLP 2020
- Towards Accurate Quantization and Pruning via Data-free Knowledge Transfer. arXiv:2010.07334
- Data-free Knowledge Distillation for Segmentation using Data-Enriching GAN. Bhogale et al. arXiv:2011.00809
- Layer-Wise Data-Free CNN Compression. Horton, Maxwell et al (Apple Inc.). cvpr 2021
- Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation. Nayak et al. WACV 2021
- Learning in School: Multi-teacher Knowledge Inversion for Data-Free Quantization. Li, Yuhang et al. cvpr 2021
- Large-Scale Generative Data-Free Distillation. Luo, Liangchen et al. cvpr 2021
- Domain Impression: A Source Data Free Domain Adaptation Method. Kurmi et al. WACV 2021
- Learning Student Networks in the Wild. (HUAWEI-Noah). CVPR 2021
- Data-Free Knowledge Distillation For Image Super-Resolution. (HUAWEI-Noah). CVPR 2021
- Zero-shot Adversarial Quantization. Liu, Yuang et al. CVPR 2021 [code]
- Source-Free Domain Adaptation for Semantic Segmentation. Liu, Yuang et al. CVPR 2021
- Data-Free Model Extraction. Jean-Baptiste et al. CVPR 2021 [code]
- Delving into Data: Effectively Substitute Training for Black-box Attack. CVPR 2021
- Zero-Shot Knowledge Distillation Using Label-Free Adversarial Perturbation With Taylor Approximation. Li, Kang et al. IEEE Access, 2021.
- Half-Real Half-Fake Distillation for Class-Incremental Semantic Segmentation. Huang, Zilong et al. arXiv:2104.00875
- Dual Discriminator Adversarial Distillation for Data-free Model Compression. Zhao, Haoran et al. TCSVT 2021
- See through Gradients: Image Batch Recovery via GradInversion. Yin, Hongxu et al. CVPR 2021
- Contrastive Model Inversion for Data-Free Knowledge Distillation. Fang, Gongfan et al. IJCAI 2021 [code]
- Graph-Free Knowledge Distillation for Graph Neural Networks. Deng, Xiang & Zhang, Zhongfei. arXiv:2105.07519
- Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Mode. Wang Zi. ICML 2021
- Data-Free Knowledge Distillation for Heterogeneous Federated Learning. Zhu, Zhuangdi et al. ICML 2021
other data-free model compression:
- Data-free Parameter Pruning for Deep Neural Networks. Srinivas, Suraj et al. arXiv:1507.06149
- Data-Free Quantization Through Weight Equalization and Bias Correction. Nagel, Markus et al. ICCV 2019
- DAC: Data-free Automatic Acceleration of Convolutional Networks. Li, Xin et al. WACV 2019
- A Privacy-Preserving DNN Pruning and Mobile Acceleration Framework. Zhan, Zheng et al. arXiv:2003.06513
- ZeroQ: A Novel Zero Shot Quantization Framework. Cai et al. CVPR 2020 [code]
- Diversifying Sample Generation for Data-Free Quantization. Zhang, Xiangguo et al. CVPR 2021
KD + AutoML
- Improving Neural Architecture Search Image Classifiers via Ensemble Learning. Macko, Vladimir et al. arXiv:1903.06236
- Blockwisely Supervised Neural Architecture Search with Knowledge Distillation. Li, Changlin et al. CVPR 2020
- Towards Oracle Knowledge Distillation with Neural Architecture Search. Kang, Minsoo et al. AAAI 2020
- Search for Better Students to Learn Distilled Knowledge. Gu, Jindong & Tresp, Volker arXiv:2001.11612
- Circumventing Outliers of AutoAugment with Knowledge Distillation. Wei, Longhui et al. arXiv:2003.11342
- Network Pruning via Transformable Architecture Search. Dong, Xuanyi & Yang, Yi. NeurIPS 2019
- Search to Distill: Pearls are Everywhere but not the Eyes. Liu Yu et al. CVPR 2020
- AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks. Fu, Yonggan et al. ICML 2020 [code]
- Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation. CVPR 2021
KD + RL
- N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018
- Knowledge Flow:Improve Upon Your Teachers. Liu, Iou-jen et al. ICLR 2019
- Transferring Knowledge across Learning Processes. Moreno, Pablo G et al. ICLR 2019
- Exploration by random network distillation. Burda, Yuri et al. ICLR 2019
- Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning. Hong, Zhang-Wei et al. arXiv:2002.00149
- Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach. Xue, Zeyue et al. arXiv:2002.02202
- Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning. Cha, han et al. arXiv:2005.06105
- Dual Policy Distillation. Lai, Kwei-Herng et al. IJCAI 2020
- Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location. El-Bouri, Rasheed et al. ICML 2020
- Reinforced Multi-Teacher Selection for Knowledge Distillation. Yuan, Fei et al. AAAI 2021
- Universal Trading for Order Execution with Oracle Policy Distillation. Fang, Yuchen et al. AAAI 2021
- Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation. Dunnhofer et al. IEEE RAL
KD + Self-supervised
- Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation. ECCV 2020
- Self-supervised Label Augmentation via Input Transformations. Lee, Hankook et al. ICML 2020 [code]
- Improving Object Detection with Selective Self-supervised Self-training. Li, Yandong et al. ECCV 2020
- Distilling Visual Priors from Self-Supervised Learning. Zhao, Bingchen & Wen, Xin. ECCVW 2020
- Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. Grill et al. arXiv:2006.07733 [code]
- Unpaired Learning of Deep Image Denoising. Wu, Xiaohe et al. arXiv:2008.13711 [code]
- SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification. Yin, Junhui et al. arXiv:2009.05972
- Introspective Learning by Distilling Knowledge from Online Self-explanation. Gu, Jindong et al. ACCV 2020
- Robust Pre-Training by Adversarial Contrastive Learning. Jiang, Ziyu et al. NeurIPS 2020 [code]
- CompRess: Self-Supervised Learning by Compressing Representations. Koohpayegani et al. NeurIPS 2020 [code]
- Big Self-Supervised Models are Strong Semi-Supervised Learners. Che, Ting et al. NeurIPS 2020 [code]
- Rethinking Pre-training and Self-training. Zoph, Barret et al. NeurIPS 2020 [code]
- ISD: Self-Supervised Learning by Iterative Similarity Distillation. Tejankar et al. cvpr 2021 [code]
- Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning. Li, Zeming et al. arXiv:2101.07525
- Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones. Cui, Cheng et al. arXiv:2103.05959
- Distilling Audio-Visual Knowledge by Compositional Contrastive Learning. Chen, Yanbei et al. CVPR 2021
- DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning. Gao, Yuting et al. arXiv:2104.09124
- Self-Ensembling Contrastive Learning for Semi-Supervised Medical Image Segmentation. Xiang, Jinxi et al. arXiv:2105.12924
- Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision. Chen, Xiaokang et al. CPVR 2021
- Adversarial Knowledge Transfer from Unlabeled Data. Gupta et al. ACM-MM 2020 code
Multi-teacher and Ensemble KD
- Learning from Multiple Teacher Networks. You, Shan et al. KDD 2017
- Learning with single-teacher multi-student. You, Shan et al. AAAI 2018
- Knowledge distillation by on-the-fly native ensemble. Lan, Xu et al. NeurIPS 2018
- Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data. ICLR 2017
- Knowledge Adaptation: Teaching to Adapt. Arxiv:1702.02052
- Deep Model Compression: Distilling Knowledge from Noisy Teachers. Sau, Bharat Bhusan et al. arXiv:1610.09650
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Tarvainen, Antti and Valpola, Harri. NeurIPS 2017
- Born-Again Neural Networks. Furlanello, Tommaso et al. ICML 2018
- Deep Mutual Learning. Zhang, Ying et al. CVPR 2018
- Collaborative learning for deep neural networks. Song, Guocong and Chai, Wei. NeurIPS 2018
- Data Distillation: Towards Omni-Supervised Learning. Radosavovic, Ilija et al. CVPR 2018
- Multilingual Neural Machine Translation with Knowledge Distillation. ICLR 2019
- Unifying Heterogeneous Classifiers with Distillation. Vongkulbhisal et al. CVPR 2019
- Distilled Person Re-Identification: Towards a More Scalable System. Wu, Ancong et al. CVPR 2019
- Diversity with Cooperation: Ensemble Methods for Few-Shot Classification. Dvornik, Nikita et al. ICCV 2019
- Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System. Yang, Ze et al. WSDM 2020
- FEED: Feature-level Ensemble for Knowledge Distillation. Park, SeongUk and Kwak, Nojun. AAAI 2020
- Stochasticity and Skip Connection Improve Knowledge Transfer. Lee, Kwangjin et al. ICLR 2020
- Online Knowledge Distillation with Diverse Peers. Chen, Defang et al. AAAI 2020
- Hydra: Preserving Ensemble Diversity for Model Distillation. Tran, Linh et al. arXiv:2001.04694
- Distilled Hierarchical Neural Ensembles with Adaptive Inference Cost. Ruiz, Adria et al. arXv:2003.01474
- Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition. Gao, Yan et al. arXiv:2005.09310
- Large-Scale Few-Shot Learning via Multi-Modal Knowledge Discovery. ECCV 2020
- Collaborative Learning for Faster StyleGAN Embedding. Guan, Shanyan et al. arXiv:2007.01758
- Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection. Chen, Cong et al. IEEE 2020 [code]
- Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for Annotation-efficient Cardiac Segmentation. MICCAI 2020
- Joint Progressive Knowledge Distillation and Unsupervised Domain Adaptation. Nguyen-Meidine et al. WACV 2020
- Semi-supervised Learning with Teacher-student Network for Generalized Attribute Prediction. Shin, Minchul et al. ECCV 2020
- Knowledge Distillation for Multi-task Learning. Li, WeiHong & Bilen, Hakan. arXiv:2007.06889 [project]
- Adaptive Multi-Teacher Multi-level Knowledge Distillation. Liu, Yuang et al. Neurocomputing 2020 [code]
- Online Ensemble Model Compression using Knowledge Distillation. ECCV 2020
- Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification. ECCV 2020
- Group Knowledge Transfer: Collaborative Training of Large CNNs on the Edge. He, Chaoyang et al. arXiv:2007.14513
- Densely Guided Knowledge Distillation using Multiple Teacher Assistants. Son, Wonchul et l. arXiv:2009.08825
- ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face Recognition. Shi, Weidong et al. arXiv:2011.00265
- Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space. Du, Shangchen et al. NeurIPS 2020 [code]
- Reinforced Multi‐Teacher Selection for Knowledge Distillation. Yuan, Fei et al. AAAI 2021
- Class-Incremental Instance Segmentation via Multi‐Teacher Networks. Gu, Yanan et al. AAAI 2021
- Collaborative Teacher-Student Learning via Multiple Knowledge Transfer. Sun, Liyuan et al. arXiv:2101.08471
- Efficient Conditional GAN Transfer with Knowledge Propagation across Classes. Shahbaziet al. CVPR 2021 [code]
- Knowledge Evolution in Neural Networks. Taha, Ahmed et al. CVPR 2021 [code]
- Distilling a Powerful Student Model via Online Knowledge Distillation. Li, Shaojie et al. arXiv:2103.14473
Knowledge Amalgamation(KA) - zju-VIPA
- Amalgamating Knowledge towards Comprehensive Classification. Shen, Chengchao et al. AAAI 2019
- Amalgamating Filtered Knowledge : Learning Task-customized Student from Multi-task Teachers. Ye, Jingwen et al. IJCAI 2019
- Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning. Luo, Sihui et al. IJCAI 2019
- Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More. Ye, Jingwen et al. CVPR 2019
- Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation. ICCV 2019
- Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN. CVPR 2020
Cross-modal / DA / Incremental Learning
- SoundNet: Learning Sound Representations from Unlabeled Video SoundNet Architecture. Aytar, Yusuf et al. NeurIPS 2016
- Cross Modal Distillation for Supervision Transfer. Gupta, Saurabh et al. CVPR 2016
- Emotion recognition in speech using cross-modal transfer in the wild. Albanie, Samuel et al. ACM MM 2018
- Through-Wall Human Pose Estimation Using Radio Signals. Zhao, Mingmin et al. CVPR 2018
- Compact Trilinear Interaction for Visual Question Answering. Do, Tuong et al. ICCV 2019
- Cross-Modal Knowledge Distillation for Action Recognition. Thoker, Fida Mohammad and Gall, Juerge. ICIP 2019
- Learning to Map Nearly Anything. Salem, Tawfiq et al. arXiv:1909.06928
- Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. Liu, Qing et al. ICCV 2019
- UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation. Kundu et al. ICCV 2019
- CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency. Chen, Yun-Chun et al. CVPR 2019
- XD:Cross lingual Knowledge Distillation for Polyglot Sentence Embeddings. ICLR 2020
- Effective Domain Knowledge Transfer with Soft Fine-tuning. Zhao, Zhichen et al. arXiv:1909.02236
- ASR is all you need: cross-modal distillation for lip reading. Afouras et al. arXiv:1911.12747v1
- Knowledge distillation for semi-supervised domain adaptation. arXiv:1908.07355
- Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition. Meng, Zhong et al. arXiv:2001.01798
- Cluster Alignment with a Teacher for Unsupervised Domain Adaptation. ICCV 2019
- Attention Bridging Network for Knowledge Transfer. Li, Kunpeng et al. ICCV 2019
- Unpaired Multi-modal Segmentation via Knowledge Distillation. Dou, Qi et al. arXiv:2001.03111
- Multi-source Distilling Domain Adaptation. Zhao, Sicheng et al. arXiv:1911.11554
- Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing. Hu, Hengtong et al. CVPR 2020
- Improving Semantic Segmentation via Self-Training. Zhu, Yi et al. arXiv:2004.14960
- Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation. arXiv:2005.08213
- Joint Progressive Knowledge Distillation and Unsupervised Domain Adaptation. arXiv:2005.07839
- Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge. Zhao, Long et al. CVPR 2020
- Large-Scale Domain Adaptation via Teacher-Student Learning. Li, Jinyu et al. arXiv:1708.05466
- Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data. Fayek, Haytham M. & Kumar, Anurag. IJCAI 2020
- Distilling Cross-Task Knowledge via Relationship Matching. Ye, Han-Jia. et al. CVPR 2020 [code]
- Modality distillation with multiple stream networks for action recognition. Garcia, Nuno C. et al. ECCV 2018
- Domain Adaptation through Task Distillation. Zhou, Brady et al. ECCV 2020 [code]
- Dual Super-Resolution Learning for Semantic Segmentation. Wang, Li et al. CVPR 2020 [code]
- Adaptively-Accumulated Knowledge Transfer for Partial Domain Adaptation. Jing, Taotao et al. ACM MM 2020
- Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation. Peng, Xingchao et al. ECCV 2020 [code]
- Unsupervised Domain Adaptive Knowledge Distillation for Semantic Segmentation. Kothandaraman et al. arXiv:2011.08007
- A Student‐Teacher Architecture for Dialog Domain Adaptation under the Meta‐Learning Setting. Qian, Kun et al. AAAI 2021
- Multimodal Fusion via Teacher‐Student Network for Indoor Action Recognition. Bruce et al. AAAI 2021
- Dual-Teacher++: Exploiting Intra-domain and Inter-domain Knowledge with Reliable Transfer for Cardiac Segmentation. Li, Kang et al. TMI 2021
- Knowledge Distillation Methods for Efficient Unsupervised Adaptation Across Multiple Domains. Nguyen et al. IVC 2021
- Feature-Supervised Action Modality Transfer. Thoker, Fida Mohammad and Snoek, Cees. ICPR 2020.
- There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge. Francisco et al. CVPR 2021
- Adaptive Consistency Regularization for Semi-Supervised Transfer Learning Abulikemu. Abulikemu et al. CVPR 2021 [code]
- Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning. Cheraghian et al. CVPR 2021
- Distilling Causal Effect of Data in Class-Incremental Learning. Hu, Xinting et al. CVPR 2021 [code]
- Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation. Chen, Shuaijun et al. CVPR 2021
- PLOP: Learning without Forgetting for Continual Semantic Segmentation. Arthur et al. CVPR 2021
- Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations. Umberto & Pietro. CVPR 2021
- Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for Single Depth Super-Resolution. Sun, Baoli et al. CVPR 2021 [code]
- CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning. Wei, Chen et al. CVPR 2021
- Adaptive Boosting for Domain Adaptation: Towards Robust Predictions in Scene Segmentation. Zheng, Zhedong & Yang, Yi. CVPR 2021
- Image Classification in the Dark Using Quanta Image Sensors. Gnanasambandam, Abhiram & Chan, Stanley H. ECCV 2020
- Dynamic Low-Light Imaging with Quanta Image Sensors. Chi, Yiheng et al. ECCV 2020
- Visualizing Adapted Knowledge in Domain Transfer. Hou, Yunzhong & Zheng, Liang. CVPR 2021
- Neutral Cross-Entropy Loss Based Unsupervised Domain Adaptation for Semantic Segmentation. Xu, Hanqing et al. IEEE TIP 2021
- Zero-Shot Detection via Vision and Language Knowledge Distillation. Gu, Xiuye et al. arXiv:2104.13921
- Rethinking Ensemble-Distillation for Semantic Segmentation Based Unsupervised Domain Adaptation. Chao, Chen-Hao et al. CVPRW 2021
- Spirit Distillation: A Model Compression Method with Multi-domain Knowledge Transfer. Wu, Zhiyuan et al. arXiv: 2104.14696
- A Fourier-based Framework for Domain Generalization. Xu, Qinwei et al. CVPR 2021
- KD3A: Unsupervised Multi-Source Decentralized Domain Adaptation via Knowledge Distillation. Feng, Haozhe et al. ICML 2021
Application of KD
- Face model compression by distilling knowledge from neurons. Luo, Ping et al. AAAI 2016
- Learning efficient object detection models with knowledge distillation. Chen, Guobin et al. NeurIPS 2017
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. Mishra, Asit et al. NeurIPS 2018
- Distilled Person Re-identification: Towars a More Scalable System. Wu, Ancong et al. CVPR 2019
- Efficient Video Classification Using Fewer Frames. Bhardwaj, Shweta et al. CVPR 2019
- Fast Human Pose Estimation. Zhang, Feng et al. CVPR 2019
- Distilling knowledge from a deep pose regressor network. Saputra et al. arXiv:1908.00858 (2019)
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation. Hou, Yuenan et al. ICCV 2019
- Structured Knowledge Distillation for Semantic Segmentation. Liu, Yifan et al. CVPR 2019
- Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019
- Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. Dong, Xuanyi and Yang, Yi. ICCV 2019
- Progressive Teacher-student Learning for Early Action Prediction. Wang, Xionghui et al. CVPR 2019
- Lightweight Image Super-Resolution with Information Multi-distillation Network. Hui, Zheng et al. ICCVW 2019
- AWSD:Adaptive Weighted Spatiotemporal Distillation for Video Representation. Tavakolian, Mohammad et al. ICCV 2019
- Dynamic Kernel Distillation for Efficient Pose Estimation in Videos. Nie, Xuecheng et al. ICCV 2019
- Teacher Guided Architecture Search. Bashivan, Pouya and Tensen, Mark. ICCV 2019
- Online Model Distillation for Efficient Video Inference. Mullapudi et al. ICCV 2019
- Distilling Object Detectors with Fine-grained Feature Imitation. Wang, Tao et al. CVPR 2019
- Relation Distillation Networks for Video Object Detection. Deng, Jiajun et al. ICCV 2019
- Knowledge Distillation for Incremental Learning in Semantic Segmentation. arXiv:1911.03462
- MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization. arXiv:1910.12295
- Teacher-Students Knowledge Distillation for Siamese Trackers. arXiv:1907.10586
- LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning. Zhao, Albert et al. CVPR 2020(pre)
- Knowledge Distillation for Brain Tumor Segmentation. arXiv:2002.03688
- ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes. Chen, Yuhua et al. CVPR 2018
- Multi-Representation Knowledge Distillation For Audio Classification. Gao, Liang et al. arXiv:2002.09607
- Collaborative Distillation for Ultra-Resolution Universal Style Transfer. Wang, Huan et al. CVPR 2020 [code]
- ShadowTutor: Distributed Partial Distillation for Mobile Video DNN Inference. Chung, Jae-Won et al. ICPP 2020 [code]
- Object Relational Graph with Teacher-Recommended Learning for Video Captioning. Zhang, Ziqi et al. CVPR 2020
- Spatio-Temporal Graph for Video Captioning with Knowledge distillation. CVPR 2020 [code]
- Squeezed Deep 6DoF Object Detection Using Knowledge Distillation. Felix, Heitor et al. arXiv:2003.13586
- Distilled Semantics for Comprehensive Scene Understanding from Videos. Tosi, Fabio et al. arXiv:2003.14030
- Parallel WaveNet: Fast high-fidelity speech synthesis. Van et al. ICML 2018
- Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning. Wang Chaoyang et al. ICCV 2019
- KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow. Murugesan et al. MIDL 2020
- Geometry-Aware Distillation for Indoor Semantic Segmentation. Jiao, Jianbo et al. CVPR 2019
- Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection. ICCV 2019
- Distill Image Dehazing with Heterogeneous Task Imitation. Hong, Ming et al. CVPR 2020
- Knowledge Distillation for Action Anticipation via Label Smoothing. Camporese et al. arXiv:2004.07711
- More Grounded Image Captioning by Distilling Image-Text Matching Model. Zhou, Yuanen et al. CVPR 2020
- Distilling Knowledge from Refinement in Multiple Instance Detection Networks. Zeni, Luis Felipe & Jung, Claudio. arXiv:2004.10943
- Enabling Incremental Knowledge Transfer for Object Detection at the Edge. arXiv:2004.05746
- Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings. Bergmann, Paul et al. CVPR 2020
- TA-Student VQA: Multi-Agents Training by Self-Questioning. Xiong, Peixi & Wu Ying. CVPR 2020
- Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. Jiang, Lu et al. ICML 2018
- A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection. Chen, Zhihao et al. CVPR 2020 [code]
- Learning Lightweight Face Detector with Knowledge Distillation. Zhang Shifeng et al. IEEE 2019
- Learning Lightweight Pedestrian Detector with Hierarchical Knowledge Distillation. ICIP 2019
- Distilling Object Detectors with Task Adaptive Regularization. Sun, Ruoyu et al. arXiv:2006.13108
- Intra-class Compactness Distillation for Semantic Segmentation. ECCV 2020
- DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild. ECCV 2020
- Self-similarity Student for Partial Label Histopathology Image Segmentation. ECCV 2020
- Robust Re-Identification by Multiple Views Knowledge Distillation. Porrello et al. ECCV 2020 [code]
- LabelEnc: A New Intermediate Supervision Method for Object Detection. Hao, Miao et al. arXiv:2007.03282
- Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer. Chen, Xinghao et al. ECCV 2020
- Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition. Si, Chenyang et al. ECCV 2020
- Dual-Path Distillation: A Unified Framework to Improve Black-Box Attacks. Zhang, Yonggang et al. ICML 2020
- RGB-IR Cross-modality Person ReID based on Teacher-Student GAN Mode. Zhang, Ziyue et al. arXiv:2007.07452
- Defocus Blur Detection via Depth Distillation. Cun, Xiaodong & Pun, Chi-Man. ECCV 2020 [code]
- Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer. Zhong, Yuanyi et al. ECCV 2020 [code]
- Weight Decay Scheduling and Knowledge Distillation for Active Learning. ECCV 2020
- Circumventing Outliers of AutoAugment with Knowledge Distillation. ECCV 2020
- Improving Face Recognition from Hard Samples via Distribution Distillation Loss. ECCV 2020
- Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition. ECCV 2020
- Self-similarity Student for Partial Label Histopathology Image Segmentation. Cheng, Hsien-Tzu et al. ECCV 2020
- Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation. Zhou, Yanning et al. arXiv:2007.10787 [code]
- Two-Level Residual Distillation based Triple Network for Incremental Object Detection. Yang, Dongbao et al. arXiv:2007.13428
- Towards Unsupervised Crowd Counting via Regression-Detection Bi-knowledge Transfer. Liu, Yuting et al. ACM MM 2020
- Teacher-Critical Training Strategies for Image Captioning. Huang, Yiqing & Chen, Jiansheng. arXiv:2009.14405
- Object Relational Graph with Teacher-Recommended Learning for Video Captioning. Zhang, Ziqi et al. CVPR 2020
- Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection. Wang Yue et al. ECCV 2020
- Residual Feature Distillation Network for Lightweight Image Super-Resolution. Liu, Jie et al. ECCV 2020
- Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging. Interspeech 2020
- Federated Model Distillation with Noise-Free Differential Privacy. arXiv:2009.05537
- Long-tailed Recognition by Routing Diverse Distribution-Aware Experts. Wang, Xudong et al. arXiv:2010.01809
- Fast Video Salient Object Detection via Spatiotemporal Knowledge Distillation. Yi, Tang & Yuan, Li. arXiv:2010.10027
- Multiresolution Knowledge Distillation for Anomaly Detection. Salehi et al. cvpr 2021
- Channel-wise Distillation for Semantic Segmentation. Shu, Changyong et al. arXiv: 2011.13256
- Teach me to segment with mixed supervision: Confident students become masters. Dolz, Jose et al. arXiv:2012.08051
- Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose Estimation. Xu, Chenxin et al. AAAI 2021 [code]
- Training data-efficient image transformers & distillation through attention. Touvron, Hugo et al. arXiv:2012.12877 [code]
- SID: Incremental Learning for Anchor-Free Object Detection via Selective and Inter-Related Distillation. Peng, Can et al. arXiv:2012.15439
- PSSM-Distil: Protein Secondary Structure Prediction (PSSP) on Low-Quality PSSM by Knowledge Distillation with Contrastive Learning. Wang, Qin et al. AAAI 2021
- Diverse Knowledge Distillation for End-to‐End Person Search. Zhang, Xinyu et al. AAAI 2021
- Enhanced Audio Tagging via Multi‐ to Single‐Modal Teacher‐Student Mutual Learning. Yin, Yifang et al. AAAI 2021
- Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks. Li, Yige et al. ICLR 2021 [code]
- Unbiased Teacher for Semi-Supervised Object Detection. Liu, Yen-Cheng et al. ICLR 2021 [code]
- Localization Distillation for Object Detection. Zheng, Zhaohui et al. cvpr 2021 [code]
- Distilling Knowledge via Intermediate Classifier Heads. Aryan & Amirali. arXiv:2103.00497
- Distilling Object Detectors via Decoupled Features. (HUAWEI-Noah). CVPR 2021
- General Instance Distillation for Object Detection. Dai, Xing et al. CVPR 2021
- Multiresolution Knowledge Distillation for Anomaly Detection. Mohammadreza et al. CVPR 2021
- Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection. Wang, Guodong et al. arXiv:2103.04257
- Teacher-Explorer-Student Learning: A Novel Learning Method for Open Set Recognition. Jaeyeon Jang & Chang Ouk Kim. IEEE 2021
- Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection. Hu, Hanzhe et al. CVPR 2021 [code]
- Compressing Visual-linguistic Model via Knowledge Distillation. Fang, Zhiyuan et al. arXiv:2104.02096
- Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification. Tian, Xudong et al. CVPR 2021
- Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation. Wang, Liwei et al. CVPR 2021
- Orderly Dual-Teacher Knowledge Distillation for Lightweight Human Pose Estimation. Zhao, Zhongqiu et al. arXiv:2104.10414
- Boosting Light-Weight Depth Estimation Via Knowledge Distillation. Hu, Junjie et al. arXiv:2105.06143
- Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching. Wu, Bofeng et al. arViv:2105.08252
- Revisiting Knowledge Distillation for Object Detection. Banitalebi-Dehkordi, Amin. arXiv: 2105.10633
- Towards Compact Single Image Super-Resolution via Contrastive Self-distillation. Yanbo, Wang et al. IJCAI 2021
- How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting. Monti, Alessio et al. CVPR 2022
for NLP & Data-Mining
- Patient Knowledge Distillation for BERT Model Compression. Sun, Siqi et al. arXiv:1908.09355
- TinyBERT: Distilling BERT for Natural Language Understanding. Jiao, Xiaoqi et al. arXiv:1909.10351
- Learning to Specialize with Knowledge Distillation for Visual Question Answering. NeurIPS 2018
- Knowledge Distillation for Bilingual Dictionary Induction. EMNLP 2017
- A Teacher-Student Framework for Maintainable Dialog Manager. EMNLP 2018
- Understanding Knowledge Distillation in Non-Autoregressive Machine Translation. arxiv 2019
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Sanh, Victor et al. arXiv:1910.01108
- Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Turc, Iulia et al. arXiv:1908.08962
- On Knowledge distillation from complex networks for response prediction. Arora, Siddhartha et al. NAACL 2019
- Distilling the Knowledge of BERT for Text Generation. arXiv:1911.03829v1
- Understanding Knowledge Distillation in Non-autoregressive Machine Translation. arXiv:1911.02727
- MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. Sun, Zhiqing et al. ACL 2020
- Acquiring Knowledge from Pre-trained Model to Neural Machine Translation. Weng, Rongxiang et al. AAAI 2020
- TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval. Lu, Wenhao et al. KDD 2020
- Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation. Xu, Yige et al. arXiv:2002.10345
- FastBERT: a Self-distilling BERT with Adaptive Inference Time. Liu, Weijie et al. ACL 2020
- LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression. Mao, Yihuan et al. arXiv:2004.04124
- DynaBERT: Dynamic BERT with Adaptive Width and Depth. Hou, Lu et al. NeurIPS 2020
- Structure-Level Knowledge Distillation For Multilingual Sequence Labeling. Wang, Xinyu et al. ACL 2020
- Distilled embedding: non-linear embedding factorization using knowledge distillation. Lioutas, Vasileios et al. arXiv:1910.06720
- TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER. Mukherjee & Awadallah. ACL 2020
- Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation. Sun, Haipeng et al. arXiv:2004.10171
- Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. Reimers, Nils & Gurevych, Iryna arXiv:2004.09813
- Distilling Knowledge for Fast Retrieval-based Chat-bots. Tahami et al. arXiv:2004.11045
- Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language. ACL 2020
- Local Clustering with Mean Teacher for Semi-supervised Learning. arXiv:2004.09665
- Time Series Data Augmentation for Neural Networks by Time Warping with a Discriminative Teacher. arXiv:2004.08780
- Syntactic Structure Distillation Pretraining For Bidirectional Encoders. arXiv: 2005.13482
- Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation. arXiv:2003.02877
- Distilling Neural Networks for Faster and Greener Dependency Parsing. arXiv:2006.00844
- Distilling Knowledge from Well-informed Soft Labels for Neural Relation Extraction. AAAI 2020 [code]
- More Grounded Image Captioning by Distilling Image-Text Matching Model. Zhou, Yuanen et al. CVPR 2020
- Multimodal Learning with Incomplete Modalities by Knowledge Distillation. Wang, Qi et al. KDD 2020
- Distilling the Knowledge of BERT for Sequence-to-Sequence ASR. Futami, Hayato et al. arXiv:2008.03822
- Contrastive Distillation on Intermediate Representations for Language Model Compression. Sun, Siqi et al. EMNLP 2020 [code]
- Noisy Self-Knowledge Distillation for Text Summarization. arXiv:2009.07032
- Simplified TinyBERT: Knowledge Distillation for Document Retrieval. arXiv:2009.07531
- Autoregressive Knowledge Distillation through Imitation Learning. arXiv:2009.07253
- BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover’s Distance. EMNLP 2020 [code]
- Interpretable Embedding Procedure Knowledge Transfer. Seunghyun Lee et al. AAAI 2021 [code]
- LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding. Fu, Hao et al. AAAI 2021
- Towards Zero-Shot Knowledge Distillation for Natural Language Processing. Ahmad et al. arXiv:2012.15495
- Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains. Pan, Haojie et al. AAAI 2021
- Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation. Feng, Lingyun et al. AAAI 2021
- Label Confusion Learning to Enhance Text Classification Models. Guo, Biyang et al. AAAI 2021
- NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application. Wu, Chuhan et al. kdd 2021
for RecSys
- Developing Multi-Task Recommendations with Long-Term Rewards via Policy Distilled Reinforcement Learning. Liu, Xi et al. arXiv:2001.09595
- A General Knowledge Distillation Framework for Counterfactual Recommendation via Uniform Data. Liu, Dugang et al. SIGIR 2020 [Sildes] [code]
- LightRec: a Memory and Search-Efficient Recommender System. Lian, Defu et al. WWW 2020
- Privileged Features Distillation at Taobao Recommendations. Xu, Chen et al. KDD 2020
- Next Point-of-Interest Recommendation on Resource-Constrained Mobile Devices. WWW 2020
- Adversarial Distillation for Efficient Recommendation with External Knowledge. Chen, Xu et al. ACM Trans, 2018
- Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System. Tang, Jiaxi et al. SIGKDD 2018
- A novel Enhanced Collaborative Autoencoder with knowledge distillation for top-N recommender systems. Pan, Yiteng et al. Neurocomputing 2019 [code]
- ADER: Adaptively Distilled Exemplar Replay Towards Continual Learning for Session-based Recommendation. Mi, Fei et al. ACM RecSys 2020
- Ensembled CTR Prediction via Knowledge Distillation. Zhu, Jieming et al.(Huawei) CIKM 2020
- DE-RRD: A Knowledge Distillation Framework for Recommender System. Kang, Seongku et al. CIKM 2020 [code]
- Neural Compatibility Modeling with Attentive Knowledge Distillation. Song, Xuemeng et al. SIGIR 2018
- Binarized Collaborative Filtering with Distilling Graph Convolutional Networks. Wang, Haoyu et al. IJCAI 2019
- Collaborative Distillation for Top-N Recommendation. Jae-woong Lee, et al. CIKM 2019
- Distilling Structured Knowledge into Embeddings for Explainable and Accurate Recommendation. Zhang Yuan et al. WSDM 2020
- UMEC:Unified Model and Embedding Compression for Efficient Recommendation Systems. ICLR 2021
- Bidirectional Distillation for Top-K Recommender System. WWW 2021
- Privileged Graph Distillation for Cold-start Recommendation. SIGIR 2021
- Topology Distillation for Recommender System [KDD 2021]
- Conditional Attention Networks for Distilling Knowledge Graphs in Recommendation [CIKM 2021]
- Explore, Filter and Distill: Distilled Reinforcement Learning in Recommendation [CIKM 2021] [Video][Code]
- Graph Structure Aware Contrastive Knowledge Distillation for Incremental Learning in Recommender Systems[CIKM 2021]
- Conditional Graph Attention Networks for Distilling and Refining Knowledge Graphs in Recommendation[CIKM 2021]
- Target Interest Distillation for Multi-Interest Recommendation [CIKM 2022] [Video] [Code]
- KDCRec: Knowledge Distillation for Counterfactual Recommendation Via Uniform Data [TKDE 2022] [Code]
- Revisiting Graph based Social Recommendation: A Distillation Enhanced Social Graph Network[WWW 2022] [Code]
- False Negative Distillation and Contrastive Learning for Personalized Outfit Recommendation [Arxiv 2110.06483]
- Dual Correction Strategy for Ranking Distillation in Top-N Recommender System[ArXiv 2109.03459v1]
- Scene-adaptive Knowledge Distillation for Sequential Recommendation via Differentiable Architecture Search. Chen, Lei et al.[ArXiv 2107.07173v1]
- Interpolative Distillation for Unifying Biased and Debiased Recommendation [SIGIR 2022] [Video] [Code]
- FedSPLIT: One-Shot Federated Recommendation System Based on Non-negative Joint Matrix Factorization and Knowledge Distillation[Arxiv 2205.02359v1]
- On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation[SIGIR 2022] [Code]
- Cross-Task Knowledge Distillation in Multi-Task Recommendation[AAAI 2022]
- Toward Understanding Privileged Features Distillation in Learning-to-Rank [NIPS 2022]
- Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation [WISE 2022]
- Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings[SIGIR 2022] [Code]
- AutoFAS: Automatic Feature and Architecture Selection for Pre-Ranking System [KDD 2022]
- An Incremental Learning framework for Large-scale CTR Prediction[RecSys 22]
- Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation [WSDM 2023] [Code]
- Unbiased Knowledge Distillation for Recommendation [WSDM 2023] [Code]
- DistilledCTR: Accurate and scalable CTR prediction model through model distillation [ESWA 2022]
- Top-aware recommender distillation with deep reinforcement learning [Information Sciences 2021]
Model Pruning or Quantization
- Accelerating Convolutional Neural Networks with Dominant Convolutional Kernel and Knowledge Pre-regression. ECCV 2016
- N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning. Ashok, Anubhav et al. ICLR 2018
- Slimmable Neural Networks. Yu, Jiahui et al. ICLR 2018
- Co-Evolutionary Compression for Unpaired Image Translation. Shu, Han et al. ICCV 2019
- MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. Liu, Zechun et al. ICCV 2019
- LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning. ICLR 2020
- Pruning with hints: an efficient framework for model acceleration. ICLR 2020
- Training convolutional neural networks with cheap convolutions and online distillation. arXiv:1909.13063
- Cooperative Pruning in Cross-Domain Deep Neural Network Compression. Chen, Shangyu et al. IJCAI 2019
- QKD: Quantization-aware Knowledge Distillation. Kim, Jangho et al. arXiv:1911.12491v1
- Neural Network Pruning with Residual-Connections and Limited-Data. Luo, Jian-Hao & Wu, Jianxin. CVPR 2020
- Training Quantized Neural Networks with a Full-precision Auxiliary Module. Zhuang, Bohan et al. CVPR 2020
- Towards Effective Low-bitwidth Convolutional Neural Networks. Zhuang, Bohan et al. CVPR 2018
- Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations. Zhuang, Bohan et al. arXiv:1908.04680
- Paying more attention to snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation. Le et al. arXiv:2006.11487 [code]
- Knowledge Distillation Beyond Model Compression. Choi, Arthur et al. arxiv:2007.01493
- Distillation Guided Residual Learning for Binary Convolutional Neural Networks. Ye, Jianming et al. ECCV 2020
- Cascaded channel pruning using hierarchical self-distillation. Miles & Mikolajczyk. BMVC 2020
- TernaryBERT: Distillation-aware Ultra-low Bit BERT. Zhang, Wei et al. EMNLP 2020
- Weight Distillation: Transferring the Knowledge in Neural Network Parameters. arXiv:2009.09152
- Stochastic Precision Ensemble: Self-‐Knowledge Distillation for Quantized Deep Neural Networks. Boo, Yoonho et al. AAAI 2021
- Binary Graph Neural Networks. Bahri, Mehdi et al. CVPR 2021
- Self-Damaging Contrastive Learning. Jiang, Ziyu et al. ICML 2021
- Information Theoretic Representation Distillation. Miles et al. BMVC 2022 [code]
- Distillation Guided Residual Learning for Binary Convolutional Neural Networks. Ye, Jianming et al. ECCV 2020
- Cascaded channel pruning using hierarchical self-distillation. Miles & Mikolajczyk. BMVC 2020
- TernaryBERT: Distillation-aware Ultra-low Bit BERT. Zhang, Wei et al. EMNLP 2020
- Weight Distillation: Transferring the Knowledge in Neural Network Parameters. arXiv:2009.09152
- Stochastic Precision Ensemble: Self-‐Knowledge Distillation for Quantized Deep Neural Networks. Boo, Yoonho et al. AAAI 2021
- Binary Graph Neural Networks. Bahri, Mehdi et al. CVPR 2021
- Self-Damaging Contrastive Learning. Jiang, Ziyu et al. ICML 2021
Beyond
- Do deep nets really need to be deep?. Ba,Jimmy, and Rich Caruana. NeurIPS 2014
- When Does Label Smoothing Help? Müller, Rafael, Kornblith, and Hinton. NeurIPS 2019
- Towards Understanding Knowledge Distillation. Phuong, Mary and Lampert, Christoph. ICML 2019
- Harnessing deep neural networks with logical rules. ACL 2016
- Adaptive Regularization of Labels. Ding, Qianggang et al. arXiv:1908.05474
- Knowledge Isomorphism between Neural Networks. Liang, Ruofan et al. arXiv:1908.01581
- (survey) Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation. arXiv:1912.13179
- Understanding and Improving Knowledge Distillation. Tang, Jiaxi et al. arXiv:2002.03532
- The State of Knowledge Distillation for Classification. Ruffy, Fabian and Chahal, Karanbir. arXiv:1912.10850 [code]
- Explaining Knowledge Distillation by Quantifying the Knowledge. Zhang, Quanshi et al. CVPR 2020
- DeepVID: deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Trans, 2019.
- On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime. Rahbar, Arman et al. arXiv:2003.13438
- (survey) Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks. Wang, Lin & Yoon, Kuk-Jin. arXiv:2004.05937
- Why distillation helps: a statistical perspective. arXiv:2005.10419
- Transferring Inductive Biases through Knowledge Distillation. Abnar, Samira et al. arXiv:2006.00555
- Does label smoothing mitigate label noise? Lukasik, Michal et al. ICML 2020
- An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation. Das, Deepan et al. arXiv:2006.03810
- (survey) Knowledge Distillation: A Survey. Gou, Jianping et al. IJCV 2021
- Does Adversarial Transferability Indicate Knowledge Transferability? Liang, Kaizhao et al. arXiv:2006.14512
- On the Demystification of Knowledge Distillation: A Residual Network Perspective. Jha et al. arXiv:2006.16589
- Enhancing Simple Models by Exploiting What They Already Know. Dhurandhar et al. ICML 2020
- Feature-Extracting Functions for Neural Logic Rule Learning. Gupta & Robles-Kelly.arXiv:2008.06326
- On the Orthogonality of Knowledge Distillation with Other Techniques: From an Ensemble Perspective. SeongUk et al. arXiv:2009.04120
- Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher. Ji, Guangda & Zhu, Zhanxing. NeurIPS 2020
- In Defense of Feature Mimicking for Knowledge Distillation. Wang, Guo-Hua et al. arXiv:2011.0142
- Solvable Model for Inheriting the Regularization through Knowledge Distillation. Luca Saglietti & Lenka Zdeborova. arXiv:2012.00194
- Undistillable: Making A Nasty Teacher That CANNOT Teach Students. ICLR 2021
- Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning. Allen-Zhu, Zeyuan & Li, Yuanzhi.(Microsoft) arXiv:2012.09816
- Student-Teacher Learning from Clean Inputs to Noisy Inputs. Hong, Guanzhe et al. CVPR 2021
- Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study. ICLR 2021 [project]
- Model Distillation for Revenue Optimization: Interpretable Personalized Pricing. Biggs, Max et al. ICML 2021
- A statistical perspective on distillation. Aditya et al(Google). ICML 2021
- (survey) Data-Free Knowledge Transfer: A Survey. Liu, Yuang et al. arXiv:2112.15278
- Knowledge Distillation Beyond Model Compression. Choi, Sarfraz et. al. arxiv:2007.01493
Distiller Tools
- Neural Network Distiller: A Python Package For DNN Compression Research. arXiv:1910.12232
- TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing. HIT and iFLYTEK. arXiv:2002.12620
- torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation.
- KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization. Shen, Het et al. arXiv:2011.14691
- Knowledge-Distillation-Zoo
- RepDistiller
- classification distiller
Note: All papers' pdf can be found and downloaded on arXiv, Bing or Google.
Source: https://github.com/FLHonker/Awesome-Knowledge-Distillation
Thanks for all contributors:
<img src="https://avatars.githubusercontent.com/u/21128481?s=28&v=4" width = "28" height = "28" alt="avatar" /> <img src="https://avatars.githubusercontent.com/u/8179405?s=28&v=4" width = "28" height = "28" alt="avatar" /> <img src="https://avatars.githubusercontent.com/u/15208588?s=28v=4" width = "28" height = "28" alt="avatar" /> <img src="https://avatars.githubusercontent.com/u/23656119?s=28v=4" width = "28" height = "28" alt="avatar" /> <img src="https://avatars.githubusercontent.com/u/19531609?v=4" width = "28" height = "28" alt="avatar" />
Contact: Yuang Liu (frankliu624outlook.com)