

[WACV 2024] Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders

This is the official codebase for our paper "Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders" presented at WACV 2024. The paper can be viewed at this link.

Overview of self-supervised auxiliary task (SSAT)


Create the conda environment and install the necessary packages:

conda env create -f environment.yml -n limiteddatavit

or alternatively

conda create -n limiteddatavit python=3.7 -y
pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

Data preparation

We provide code for training on ImageNet, CIFAR10, and CIFAR100. CIFAR10 and 100 will be automatically downloaded using torchvision, ImageNet must be downloaded separately.

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:


Pretrained model weights

ModelDatasetEvaluation Command
ViT-T + SSAT (weights)ImageNet-1kpython main_two_branch.py --data_path /path/to/imagenet/ --resume vittiny-ssat_imagenet1k_weights.pth --eval --model mae_vit_tiny
ViT-S + SSAT (weights)ImageNet-1kpython main_two_branch.py --data_path /path/to/imagenet/ --resume vitsmall-ssat_imagenet1k_weights.pth --eval --model mae_vit_small

Training models

To train ViT-Tiny with Self-Supervised Auxiliary Task on ImageNet-1k using 8 GPUs run the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 main_two_branch.py --data_path /path/to/imagenet/ --output_dir ./output_dir --epochs 100 --model mae_vit_tiny

Available arguments for --data_path are /path/to/imagenet, c10, c100. Other datasets can be added in utils/datasets.py.

Available arguments for --model are mae_vit_tiny, mae_vit_small, mae_vit_base, mae_vit_large, mae_vit_huge.

Citation & Acknowledgement

    title={Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders},
    author={Srijan Das and Tanmay Jain and Dominick Reilly and Pranav Balaji and Soumyajit Karmakar and Shyam Marjit and Xiang Li and Abhijit Das and Michael Ryoo},
    journal={2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},

This repository is built on top of the code for the paper Masked Autoencoders Are Scalable Vision Learners from Meta Research.