Awesome
Efficient Training of Visual Transformers with Small Datasets
To appear in NerIPS 2021.
[paper][Poster & Video][arXiv][code] [reviews] <br> Yahui Liu<sup>1,3</sup>, Enver Sangineto<sup>1</sup>, Wei Bi<sup>2</sup>, Nicu Sebe<sup>1</sup>, Bruno Lepri<sup>3</sup>, Marco De Nadai<sup>3</sup> <br> <sup>1</sup>University of Trento, Italy, <sup>2</sup>Tencent AI Lab, China, <sup>3</sup>Bruno Kessler Foundation, Italy. <br>
Data preparation
- Download the datasets and pre-processe some of them (i.e., imagenet, domainnet) by using codes in the
scripts
folder. - The datasets are prepared with the following stucture (except CIFAR-10/100 and SVHN):
dataset_name
|__train
| |__category1
| | |__xxx.jpg
| | |__...
| |__category2
| | |__xxx.jpg
| | |__...
| |__...
|__val
|__category1
| |__xxx.jpg
| |__...
|__category2
| |__xxx.jpg
| |__...
|__...
Training
After prepare the datasets, we can simply start the training with 8 NVIDIA V100 GPUs:
sh train.sh
Evaluation
We can also load the pre-trained model and test the performance:
sh eval.sh
Pretrained models
For fast evaluation, we present the results of Swin-T trained with 100 epochs on various datasets as an example (Note that we save the model every 5 epochs during the training, so the attached best models may be slight different from the reported performances).
Datasets | Baseline | Ours |
---|---|---|
CIFAR-10 | 59.47 | 83.89 |
CIFAR-100 | 53.28 | 66.23 |
SVHN | 71.60 | 94.23 |
Flowers102 | 34.51 | 39.37 |
Clipart | 38.05 | 47.47 |
Infograph | 8.20 | 10.16 |
Painting | 35.92 | 41.86 |
Quickdraw | 24.08 | 69.41 |
Real | 73.47 | 75.59 |
Sketch | 11.97 | 38.55 |
We provide a demo to download the pretrained models from Google Drive directly:
python3 ./scripts/collect_models.py
Related Work:
Acknowledgments
This code is highly based on the Swin-Transformer. Thanks to the contributors of this project.
Citation
@InProceedings{liu2021efficient,
author = {Liu, Yahui and Sangineto, Enver and Bi, Wei and Sebe, Nicu and Lepri, Bruno and De Nadai, Marco},
title = {Efficient Training of Visual Transformers with Small Datasets},
booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
year = {2021}
}
If you have any questions, please contact me without hesitation (yahui.cvrs AT gmail.com).