Home

Awesome

Updates

ResT: An Efficient Transformer for Visual Recognition

Official PyTorch implementation of ResTv1 and ResTv2, from the following paper:

ResT: An Efficient Transformer for Visual Recognition. NeurIPS 2021.
ResT V2: Simpler, Faster and Stronger. NeurIPS 2022.
By Qing-Long Zhang and Yu-Bin Yang
State Key Laboratory for Novel Software Technology at Nanjing University


<p align="center"> <img src="figures/fig_1.png" width=100% height=100% class="center"> </p>

ResTv1 is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It can tackle input images with arbitrary size. Besides, ResT compressed the memory of standard MSA and model the interaction between multi-heads while keeping the diversity ability.

Catalog

<!-- ✅ ⬜️ -->

Results and Pre-trained Models

ImageNet-1K trained models

nameresolutionacc@1#paramsFLOPsThroughputmodel
ResTv1-Lite224x22477.211M1.4G1246baidu
ResTv1-S224x22479.614M1.9G1043baidu
ResTv1-B224x22481.630M4.3G673baidu
ResTv1-L224x22483.652M7.9G429baidu
ResTv2-T224x22482.330M4.1G826baidu
ResTv2-T384x38483.730M12.7G319baidu
ResTv2-S224x22483.241M6.0G687baidu
ResTv2-S384x38484.541M18.4G256baidu
ResTv2-B224x22483.756M7.9G582baidu
ResTv2-B384x38485.156M24.3G210baidu
ResTv2-L224x22484.287M13.8G415baidu
ResTv2-L384x38485.487M42.4G141baidu

Note: Access code for baidu is rest. Pretrained models of ResTv1 is now available in google drive.

Installation

Please check INSTALL.md for installation instructions.

Evaluation

We give an example evaluation command for a ImageNet-1K pre-trained, then ImageNet-1K fine-tuned ResTv2-T:

Single-GPU

python main.py --model restv2_tiny --eval true \
--resume restv2_tiny_384.pth \
--input_size 384 --drop_path 0.1 \
--data_path /path/to/imagenet-1k

This should give

* Acc@1 83.708 Acc@5 96.524 loss 0.777

Training

See TRAINING.md for training and fine-tuning instructions.

Acknowledgement

This repository is built using the timm library.

License

This project is released under the Apache License 2.0. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

ResTv1

@inproceedings{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Qinglong Zhang and Yu-bin Yang},
  booktitle={Advances in Neural Information Processing Systems},
  year={2021},
  url={https://openreview.net/forum?id=6Ab68Ip4Mu}
}

ResTv2

@article{zhang2022rest,
  title={ResT V2: Simpler, Faster and Stronger},
  author={Zhang, Qing-Long and Yang, Yu-Bin},
  journal={arXiv preprint arXiv:2204.07366},
  year={2022}

Third-party Implementation

[2022/05/26] ResT and ResT v2 have been integrated into PaddleViT, checkout here for the 3rd party implementation on Paddle framework!