Home

Awesome

Fully Attentional Networks

PWC PWC PWC

Project Page | Paper | Slides | Poster

Understanding The Robustness in Vision Transformers.
Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng and Jose M. Alvarez.
International Conference on Machine Learning, 2022.

<p align="center"> <img src="demo/Teaser.png" width=60% height=60% class="center"> </p>

This repository contains the official Pytorch implementation of the training/evaluation code and the pretrained models of Fully Attentional Network (FAN).

FAN is a family of general-purpose Vision Transformer backbones that are highly robust to unseen natural corruptions in various visual recognition tasks.

Catalog

<!-- ✅ ⬜️ -->

Dependencies

The repo is built based on timm library, which can be installed via: pip3 install timm==0.5.4 pip3 install torchvision==0.9.0

Dataset preparation

Download ImageNet clean dataset and ImageNet-C dataset and structure the datasets as follows:

/path/to/imagenet-C/
  clean/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
  corruption1/
    severity1/
      class1/
        img3.jpeg
      class2/
        img4.jpeg
    severity2/
      class1/
        img3.jpeg
      class2/
        img4.jpeg

For other out-of-distribution shift benchmarks, we use ImageNet-A or ImageNet-R for evaluation.

Results and Pre-trained Models

FAN-ViT ImageNet-1K trained models

ModelResolutionIN-1KIN-CIN-AIN-R#ParamsDownload
FAN-T-ViT224x22479.257.515.642.57.3Mmodel
FAN-S-ViT224x22482.564.529.150.428.0Mmodel
FAN-B-ViT224x22483.667.035.451.854.0Mmodel
FAN-L-ViT224x22483.967.737.253.180.5Mmodel

FAN-Hybrid ImageNet-1K trained models

ModelResolutionIN-1K / IN-CCity / City-CCOCO / COCO-C#ParamsDownload
FAN-T-Hybrid224x22480.1/57.481.2/57.150.2/33.17.4Mmodel
FAN-S-Hybrid224x22483.5/64.781.5/66.453.3/38.726.3Mmodel
FAN-B-Hybrid224x22483.9/66.482.2/66.954.2/40.650.4Mmodel
FAN-L-Hybrid224x22484.3/68.382.3/68.755.1/42.076.8Mmodel

FAN-Hybrid ImageNet-22K trained models

ModelResolutionIN-1K/IN-C#ParamsDownload
FAN-B-Hybrid224x22485.3/70.550.4Mmodel
FAN-B-Hybrid384x38485.6/-50.4Mmodel
FAN-L-Hybrid224x22486.5/73.676.8Mmodel
FAN-L-Hybrid384x38487.1/-76.8Mmodel

The pre-trained model weights for FAN-B-Hybrid and FAN-L-Hybrid on ImageNet22K without fine-tuning on ImageNet-1k are also uploaded. Checkpoints cabn be downloaded by clicking on the model name.

Demos

Semantic Segmentation on Cityscapes-C

<p align="center"> <img src="demo/Demo_CityC.gif" alt="animated"> </p>

ImageNet-1K Training

FAN-T training on ImageNet-1K with 4 8-GPU nodes:

python3 -m torch.distributed.launch --nproc_per_node=8 --nnodes=$rank_num \
	--node_rank=$rank_index --master_addr="ip.addr" --master_port=$MASTER_PORT \
	 main.py  /PATH/TO/IMAGENET/ --model fan_tiny_8_p4_hybrid -b 32 --sched cosine --epochs 300 \
	--opt adamw -j 16 --warmup-epochs 5  \
	--lr 10e-4 --drop-path .1 --img-size 224 \
	--output ../fan_tiny_8_p4_hybrid/ \
	--amp --model-ema \

Robustness on ImageNet-C

bash scripts/imagenet_c_val.sh $model_name $ckpt

Measurement on ImageNet-A

bash scripts/imagenet_a_val.sh $model_name $ckpt

Measurement on ImageNet-R

bash scripts/imagenet_r_val.sh $model_name $ckpt

Acknowledgement

This repository is built using the timm library, DeiT, PVT and SegFormer repositories.

Citation

If you find this repository helpful, please consider citing:

@inproceedings{zhou2022understanding,
  title   = {Understanding The Robustness in Vision Transformers},
  author  = {Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez},
  booktitle = {International Conference on Machine Learning (ICML)},
  year    = {2022},
}