Home

Awesome

Official Pytorch Implementation of SegViT [ckpt]

SegViT: Semantic Segmentation with Plain Vision Transformers

Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan.

NeurIPS 2022. [paper]

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

Bowen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen and Yifan Liu.

IJCV 2023. [paper] [we are refactoring code for release ...]

This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for SegViT and the extended version SegViT v2.

Highlights

As shown in the following figure, the similarity between the class query and the image features is transfered to the segmentation mask.

<img src="./resources/v2_figure_1.png"> <img src="./resources/teaser-01.png"> <img src="resources/atm_arch-1.png">

Getting started

  1. Install the mmsegmentation library and some required packages.
pip install mmcv-full==1.4.4 mmsegmentation==0.24.0
pip install scipy timm

Training

python tools/dist_train.sh  configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py 

Evaluation

python tools/dist_test.sh configs/segvit/segvit_vit-l_jax_640x640_160k_ade20k.py   {path_to_ckpt}

Datasets

Please follow the instructions of mmsegmentation data preparation

Results

Model backbonedatasetsmIoUmIoU (ms)GFlopsckpt
Vit-BaseADE20k51.353.0120.9model
Vit-Large (Shrunk)ADE20k53.955.1373.5model
Vit-LargeADE20k54.655.2637.9model
Vit-Large (Shrunk)COCOStuff10K49.149.4224.8model
Vit-LargeCOCOStuff10K49.950.3383.9model
Vit-Large (Shrunk)PASCAL-Context (59cls)62.363.7186.9model
Vit-LargePASCAL-Context (59cls)64.165.3321.6model

License

For academic use, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial use, please contact the authors.

Citation

@article{zhang2022segvit,
  title={SegViT: Semantic Segmentation with Plain Vision Transformers},
  author={Zhang, Bowen and Tian, Zhi and Tang, Quan and Chu, Xiangxiang and Wei, Xiaolin and Shen, Chunhua and Liu, Yifan},
  journal={NeurIPS},
  year={2022}
}

@article{zhang2023segvitv2,
  title={SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers},
  author={Zhang, Bowen and Liu, Liyang and Phan, Minh Hieu and Tian, Zhi and Shen, Chunhua and Liu, Yifan},
  journal={IJCV},
  year={2023}
}