Awesome
Robustifying Token Attention for Vision Transformers
Yong Guo, David Stutz, and Bernt Schiele. ICCV 2023.
Paper | Slides | Poster
<p align="center"> <img src="imgs/motivation.jpg" width=100% class="center"> </p>This repository contains the official Pytorch implementation and the pretrained models of Robustifying Token Attention for Vision Transformers.
Catalog
- Pre-trained models for image classification
- Pre-trained models for semantic segmentation
- Evaluation and Training Code
Dependencies
Our code is built based on pytorch and timm library. Please check the detailed dependencies in requirements.txt.
Dataset Preparation
- Image Classfication: ImageNet and related robustness benchmarks
Please download the clean ImageNet dataset. We evaluate the models on varisous robustness benchmarks, including ImageNet-C, ImageNet-A, ImageNet-P, and ImageNet-R.
- Semantic Segmentaton: Cityscapes and related robustness benchmarks
Please download the clean Cityscapes dataset. We evaluate the models on varisous robustness benchmarks, including Cityscapes-C and ACDC (test set).
Training and Evaluation (using TAP and ADL)
-
Image Classification:
Please see how to train/evaluate FAN and RVT models in TAPADL_FAN and TAPADL_RVT, respectively.
-
Semantic Segmentation:
Please see how to train/evaluate our segmentation model in TAPADL_FAN/segmentation.
Acknowledgement
This repository is built using the timm library, RVT, and FAN repositories.
Citation
If you find this repository helpful, please consider citing:
@inproceedings{guo2023robustifying,
title={Robustifying token attention for vision transformers},
author={Guo, Yong and Stutz, David and Schiele, Bernt},
booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)}},
year={2023}
}