Awesome

MAE-Lite

A Closer Look at Self-Supervised Lightweight Vision Transformers
Shaoru Wang, Jin Gao*, Zeming Li, Xiaoqin Zhang, Weiming Hu
ICML 2023

News

2023.5: Code & models are released!
2023.4: Our paper is accepted by ICML 2023!
2022.5: Our initial version of the paper was published on Arxiv.

Introduction

MAE-Lite focuses on exploring the pre-training of lightweight Vision Transformers (ViTs). This repo provide the code and models for the study in the paper.

We provide advanced pre-training (based on MAE) and fine-tuning recipes for lightweight ViTs and demonstrate that even vanilla lightweight ViT (e.g., ViT-Tiny) beats most previous SOTA ConvNets and ViT derivatives with delicate network architecture design. We achieve 79.0% top-1 accuracy on ImageNet with vanilla ViT-Tiny (5.7M).
We provide code for the transfer evaluation of pre-trained models on several classification tasks (e.g., Oxford 102 Flower, Oxford-IIIT Pet, FGVC Aircraft, CIFAR, etc.) and COCO detection tasks (based on ViTDet). We find that the self-supervised pre-trained ViTs work worse than the supervised pre-trained ones on data-insufficient downstream tasks.
We provide code for the analysis tools used in the paper to examine the layer representations and attention distance & entropy for the ViTs.
We provide code and models for our proposed knowledge distillation method for the pre-trained lightweight ViTs based on MAE, which shows superiority on the trasfer evaluation of data-insufficient classification tasks and dense prediction tasks.

Getting Started

Installation

Setup conda environment:

# Create environment
conda create -n mae-lite python=3.7 -y
conda activate mae-lite

# Instaill requirements
conda install pytorch==1.9.0 torchvision==0.10.0 -c pytorch -y

# Clone MAE-Lite
git clone https://github.com/wangsr126/mae-lite.git
cd mae-lite

# Install other requirements
pip3 install -r requirements.txt
python3 setup.py build develop --user

Data Preparation

Prepare the ImageNet data in <BASE_FOLDER>/data/imagenet/imagenet_train, <BASE_FOLDER>/data/imagenet/imagenet_val.

Pre-Training

To pre-train ViT-Tiny with our recommended MAE recipe:

# 4096 batch-sizes on 8 GPUs:
cd projects/mae_lite
ssl_train -b 4096 -d 0-7 -e 400 -f mae_lite_exp.py --amp \
--exp-options exp_name=mae_lite/mae_tiny_400e

Fine-Tuning on ImageNet

Please download the pre-trained models, e.g.,

download MAE-Tiny to <BASE_FOLDER>/checkpoints/mae_tiny_400e.pth.tar

To fine-tune with the improved recipe:

# 1024 batch-sizes on 8 GPUs:
cd projects/eval_tools
ssl_train -b 1024 -d 0-7 -e 300 -f finetuning_exp.py --amp \
[--ckpt <checkpoint-path>] --exp-options pretrain_exp_name=mae_lite/mae_tiny_400e

<checkpoint-path>: if set to <BASE_FOLDER>/checkpoints/mae_tiny_400e.pth.tar, it will be loaded as initialization; If not set, the checkpoint at <BASE_FOLDER>/outputs/mae_lite/mae_tiny_400e/last_epoch_ckpt.pth.tar will be loaded automatically.

Evaluation of fine-tuned models

download MAE-Tiny-FT to <BASE_FOLDER>/checkpoints/mae_tiny_400e_ft_300e.pth.tar

# 1024 batch-sizes on 1 GPUs:
python mae_lite/tools/eval.py -b 1024 -d 0 -f projects/eval_tools/finetuning_exp.py \
--ckpt <BASE_FOLDER>/checkpoints/mae_tiny_400e_ft_300e.pth.tar \
--exp-options pretrain_exp_name=mae_lite/mae_tiny_400e/ft_eval

And you will get "Top1: 77.978" if all right.

download MAE-Tiny-FT-RPE to <BASE_FOLDER>/checkpoints/mae_tiny_400e_ft_rpe_1000e.pth.tar

# 1024 batch-sizes on 1 GPUs:
python mae_lite/tools/eval.py -b 1024 -d 0 -f projects/eval_tools/finetuning_rpe_exp.py \
--ckpt <BASE_FOLDER>/checkpoints/mae_tiny_400e_ft_rpe_1000e.pth.tar \
--exp-options pretrain_exp_name=mae_lite/mae_tiny_400e/ft_rpe_eval

And you will get "Top1: 79.002" if all right.

Pre-Training with Distillation

Please refer to DISTILL.md.

Transfer to Other Datasets

Please refer to TRANSFER.md.

Transfer to Detection Tasks

Please refer to DETECTION.md.

Experiments of MoCo-v3

Please refer to MOCOV3.md.

Models Analysis Tools

Please refer to VISUAL.md.

Main Results

pre-train code	pre-train</br> epochs	fine-tune recipe	fine-tune epoch	accuracy	ckpt
-	-	impr.	300	75.8	link
mae_lite	400	-	-	-	link
		impr.	300	78.0	link
		impr.+RPE	1000	79.0	link
mae_lite_distill	400	-	-	-	link
		impr.	300	78.4	link

Citation

Please cite the following paper if this repo helps your research:

@misc{wang2023closer,
      title={A Closer Look at Self-Supervised Lightweight Vision Transformers}, 
      author={Shaoru Wang and Jin Gao and Zeming Li and Xiaoqin Zhang and Weiming Hu},
      journal={arXiv preprint arXiv:2205.14443},
      year={2023},
}

Acknowledge

We thank for the code implementation from timm, MAE, MoCo-v3.

License

This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.