Home

Awesome

ResFormer: Scaling ViTs with Multi-Resolution Training

Official PyTorch implementation of ResFormer: Scaling ViTs with Multi-Resolution Training, CVPR2023 | Paper

Overview

<p align="center"> <img src="./imgs/network.png" width=100% height=100% class="center"> </p> We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales. More importantly, to alternate among varying resolutions effectively, especially novel ones in testing, we propose a global-local positional embedding strategy that changes smoothly conditioned on input sizes.

Installation

Image Classification

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install timm==0.5.4
pip install tensorboard

Scripts

Training on ImageNet-1k

The default script for training ResFormer-S-MR with training resolutions of 224, 160 and 128.

python -m torch.distributed.launch --nproc_per_node 8 main.py  --data-path  YOUR_DATA_PATH  --model resformer_small_patch16  --output_dir YOUR_OUTPUT_PATH --batch-size 128 --pin-mem --input-size 224 160 128 --auto-resume  --distillation-type 'smooth-l1' --distillation-target cls --sep-aug

The default script for training ResFormer-B-MR with training resolutions of 224, 160 and 128.

python -m torch.distributed.launch --nproc_per_node 8 main.py  --data-path  YOUR_DATA_PATH  --model resformer_base_patch16  --output_dir YOUR_OUTPUT_PATH --batch-size 128 --pin-mem --input-size 224 160 128 --auto-resume  --distillation-type 'smooth-l1' --distillation-target cls --sep-aug --epochs 200 --drop-path 0.2  --lr 8e-4 --warmup-epochs 20 --clip-grad 5.0 --epochs 200  --cooldown-epochs 0  

Model Zoo

Image Classification on ImageNet-1k

nameTraining ResTop-1(96)Top-1(128)Top-1(160)Top-1(224)Top-1(384)Top-1(512)model
ResFormer-T-MR128, 160, 22461.4067.7871.0973.8575.0473.77google
ResFormer-S-MR128, 160, 22473.5978.2480.3982.1682.7282.00google
ResFormer-S-MR128, 224, 38472.9277.8480.0982.2883.7083.86google
ResFormer-B-MR128, 160, 22475.8679.7481.5282.7283.2982.63google

Catalog

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

@inproceedings{tian2022resformer,
  title={ResFormer: Scaling ViTs with Multi-Resolution Training},
  author={Tian, Rui and Wu, Zuxuan and Dai, Qi and Hu, Han and Qiao, Yu and Jiang, Yu-Gang},
  booktitle={CVPR},
  year={2023}
}