Awesome
SwiftFormer
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Abdelrahman Shaker<sup>*1</sup>, Muhammad Maaz<sup>1</sup>, Hanoona Rasheed<sup>1</sup>, Salman Khan<sup>1</sup>, Ming-Hsuan Yang<sup>2,3</sup> and Fahad Shahbaz Khan<sup>1,4</sup>
Mohamed Bin Zayed University of Artificial Intelligence<sup>1</sup>, University of California Merced<sup>2</sup>, Google Research<sup>3</sup>, Linkoping University<sup>4</sup>
<!-- [![Website](https://img.shields.io/badge/Project-Website-87CEEB)](site_url) --> <!-- [![video](https://img.shields.io/badge/Video-Presentation-F9D371)](youtube_link) --> <!-- [![slides](https://img.shields.io/badge/Presentation-Slides-B762C1)](presentation) -->:rocket: News
- (Jul 14, 2023): SwiftFormer has been accepted at ICCV 2023. :fire::fire:
- (Mar 27, 2023): Classification training and evaluation codes along with pre-trained models are released.
Classification on ImageNet-1K
Models
Model | Top-1 accuracy | #params | GMACs | Latency | Ckpt | CoreML |
---|---|---|---|---|---|---|
SwiftFormer-XS | 75.7% | 3.5M | 0.6G | 0.7ms | XS | XS |
SwiftFormer-S | 78.5% | 6.1M | 1.0G | 0.8ms | S | S |
SwiftFormer-L1 | 80.9% | 12.1M | 1.6G | 1.1ms | L1 | L1 |
SwiftFormer-L3 | 83.0% | 28.5M | 4.0G | 1.9ms | L3 | L3 |
Detection and Segmentation Qualitative Results
<p align="center"> <img src="images/detection_seg.png" width=100%> <br> </p> <p align="center"> <img src="images/semantic_seg.png" width=100%> <br> </p>Latency Measurement
The latency reported in SwiftFormer for iPhone 14 (iOS 16) uses the benchmark tool from XCode 14.
SwiftFormer meets Android
Community-driven results with Samsung Galaxy S23 Ultra, with Qualcomm Snapdragon 8 Gen 2:
-
Export & profiler results of
SwiftFormer_L1
:QNN 2.16 2.17 2.18 Latency (msec) 2.63 2.26 2.43 -
Export & profiler results of SwiftFormerEncoder block:
QNN 2.16 2.17 2.18 Latency (msec) 2.17 1.69 1.7 Refer to script above for details of the input & block parameters.
❓ Interested in reproducing the results above?
Refer to Issue #14 for details about exporting & profiling.
ImageNet
Prerequisites
conda
virtual environment is recommended.
conda create --name=swiftformer python=3.9
conda activate swiftformer
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install timm
pip install coremltools==5.2.0
Data preparation
Download and extract ImageNet train and val images from http://image-net.org. The training and validation data are expected to be in the train
folder and val
folder respectively:
|-- /path/to/imagenet/
|-- train
|-- val
Single machine multi-GPU training
We provide training script for all models in dist_train.sh
using PyTorch distributed data parallel (DDP).
To train SwiftFormer models on an 8-GPU machine:
sh dist_train.sh /path/to/imagenet 8
Note: specify which model command you want to run in the script. To reproduce the results of the paper, use 16-GPU machine with batch-size of 128 or 8-GPU machine with batch size of 256. Auto Augmentation, CutMix, MixUp are disabled for SwiftFormer-XS, and CutMix, MixUp are disabled for SwiftFormer-S.
Multi-node training
On a Slurm-managed cluster, multi-node training can be launched as
sbatch slurm_train.sh /path/to/imagenet SwiftFormer_XS
Note: specify slurm specific paramters in slurm_train.sh
script.
Testing
We provide an example test script dist_test.sh
using PyTorch distributed data parallel (DDP).
For example, to test SwiftFormer-XS on an 8-GPU machine:
sh dist_test.sh SwiftFormer_XS 8 weights/SwiftFormer_XS_ckpt.pth
Citation
if you use our work, please consider citing us:
@InProceedings{Shaker_2023_ICCV,
author = {Shaker, Abdelrahman and Maaz, Muhammad and Rasheed, Hanoona and Khan, Salman and Yang, Ming-Hsuan and Khan, Fahad Shahbaz},
title = {SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2023},
}
Contact:
If you have any questions, please create an issue on this repository or contact at abdelrahman.youssief@mbzuai.ac.ae.
Acknowledgement
Our code base is based on LeViT and EfficientFormer repositories. We thank authors for their open-source implementation.