Home

Awesome

Masked Autoencoders Enable Efficient Knowledge Distillers

This is a PyTorch implementation of the DMAE paper.

<div align="center"> <img src="dmae_teaser.png"/> </div>

Preparation

Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code. Please refer to MAE official codebase for other enrironment requirements.

Pre-Training

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.

To pre-train models in an 8-gpu machine, please first download the ViT-Large model as the teacher model, and then run:

bash pretrain.sh

Finetuning

To fintune models in an 8-gpu machine, run:

bash finetune.sh

Models

The checkpoints of our pre-trained and finetuned ViT-Base on ImageNet-1k can be downloaded as following:

Pretrained ModelEpoch
ViT-Basedownload link100
Finetuned ModelAcc
ViT-Basedownload link84.0

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Acknowledgment

This work is partially supported by TPU Research Cloud (TRC) program, and Google Cloud Research Credits program.

Citation

@inproceedings{bai2022masked,
  title     = {Masked autoencoders enable efficient knowledge distillers},
  author    = {Bai, Yutong and Wang, Zeyu and Xiao, Junfei and Wei, Chen and Wang, Huiyu and Yuille, Alan and Zhou, Yuyin and Xie, Cihang},
  booktitle = {CVPR},
  year      = {2023}
}