Home

Awesome

Efficient Training of Visual Transformers with Small Datasets

Maintenance Contributing

To appear in NerIPS 2021.

[paper][Poster & Video][arXiv][code] [reviews] <br> Yahui Liu<sup>1,3</sup>, Enver Sangineto<sup>1</sup>, Wei Bi<sup>2</sup>, Nicu Sebe<sup>1</sup>, Bruno Lepri<sup>3</sup>, Marco De Nadai<sup>3</sup> <br> <sup>1</sup>University of Trento, Italy, <sup>2</sup>Tencent AI Lab, China, <sup>3</sup>Bruno Kessler Foundation, Italy. <br>

Data preparation

DatasetDownload Link
ImageNettrain,val
CIFAR-10all
CIFAR-100all
SVHNtrain,test, extra
Oxford-Flower102images, labels, splits
Clipartimages, train_list, test_list
Infographimages, train_list, test_list
Paintingimages, train_list, test_list
Quickdrawimages, train_list, test_list
Realimages, train_list, test_list
Sketchimages, train_list, test_list
dataset_name
  |__train
  |    |__category1
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__category2
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__...
  |__val
       |__category1
       |    |__xxx.jpg
       |    |__...
       |__category2
       |    |__xxx.jpg
       |    |__...
       |__...

Training

After prepare the datasets, we can simply start the training with 8 NVIDIA V100 GPUs:

sh train.sh

Evaluation

We can also load the pre-trained model and test the performance:

sh eval.sh

Pretrained models

For fast evaluation, we present the results of Swin-T trained with 100 epochs on various datasets as an example (Note that we save the model every 5 epochs during the training, so the attached best models may be slight different from the reported performances).

DatasetsBaselineOurs
CIFAR-1059.4783.89
CIFAR-10053.2866.23
SVHN71.6094.23
Flowers10234.5139.37
Clipart38.0547.47
Infograph8.2010.16
Painting35.9241.86
Quickdraw24.0869.41
Real73.4775.59
Sketch11.9738.55

We provide a demo to download the pretrained models from Google Drive directly:

python3 ./scripts/collect_models.py

Related Work:

Acknowledgments

This code is highly based on the Swin-Transformer. Thanks to the contributors of this project.

Citation

@InProceedings{liu2021efficient,
    author    = {Liu, Yahui and Sangineto, Enver and Bi, Wei and Sebe, Nicu and Lepri, Bruno and De Nadai, Marco},
    title     = {Efficient Training of Visual Transformers with Small Datasets},
    booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
    year      = {2021}
}

If you have any questions, please contact me without hesitation (yahui.cvrs AT gmail.com).