Home

Awesome

Exploring Efficient Few-shot Adaptation for Vision Transformers

This is an official implementation in Pytorch of eTT. Our paper is available at link.

Data Preparation

This repo adopts the same data structure as TSA. We simply quote the original data preparation here, thank the authors of TSA for the contribution.

Training & Inference

To run this code you need to get a DINO-pretrained network weight. We recomment to re-run the original DINO using the meta-train set of ImageNet as training data. To do this, you need to clone the DINO repo and copy all files in pretrain_code_snippet in the DINO folder and run the training script. In detail,

git clone https://github.com/facebookresearch/dino.git

cp -rf pretrain_code_snippet/* dino/

python -m torch.distributed.launch --nproc_per_node=8 main_dino_metadataset.py --arch {ARCH} --data_path /path/to/imagenet/train --output_dir /path/to/saving_dir

This is the main setting in our paper. Technically speaking you can also use the 1000-class DINO weight provided in the original repo for the experiments.

After getting the pretrained weight you can run the meta-testing as follow:

python test_extractor_pa_vit_prefix.py --data.test ilsvrc_2012 omniglot aircraft cu_birds dtd quickdraw fungi vgg_flower traffic_sign mscoco --model.ckpt {WEIGHT PATH}

The code adopts ViT-small as default backbone structure can be modified according to your requirement.

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@article{xu2023exploring,
  title={Exploring efficient few-shot adaptation for vision transformers},
  author={Xu, Chengming and Yang, Siqian and Wang, Yabiao and Wang, Zhanxiong and Fu, Yanwei and Xue, Xiangyang},
  journal={arXiv preprint arXiv:2301.02419},
  year={2023}
}

Acknowledgement

We modify our code from TSA.