Home

Awesome

SEgmentation TRansformers -- SETR

SETR

image

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers <br> Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, Li Zhang <br> CVPR 2021

Vision Transformers: From Semantic Segmentation to Dense Prediction [Springer] [arxiv] <br> Li Zhang, Jiachen Lu, Sixiao Zheng, Xinxuan Zhao, Xiatian Zhu, Yanwei Fu, Tao Xiang, Jianfeng Feng <br> IJCV 2024 July

SETR

Cityscapes

MethodCrop SizeBatch sizeiterationsetmIoUmodelconfig
SETR-Naive768x768840kval77.37google driveconfig
SETR-Naive768x768880kval77.90google driveconfig
SETR-MLA768x768840kval76.65google driveconfig
SETR-MLA768x768880kval77.24google driveconfig
SETR-PUP768x768840kval78.39google driveconfig
SETR-PUP768x768880kval79.34google driveconfig
SETR-Naive-Base768x768840kval75.54google driveconfig
SETR-Naive-Base768x768880kval76.25google driveconfig
SETR-Naive-DeiT768x768840kval77.85google driveconfig
SETR-Naive-DeiT768x768880kval78.66google driveconfig
SETR-MLA-DeiT768x768840kval78.04google driveconfig
SETR-MLA-DeiT768x768880kval78.98google driveconfig
SETR-PUP-DeiT768x768840kval78.79google driveconfig
SETR-PUP-DeiT768x768880kval79.45google driveconfig

ADE20K

MethodCrop SizeBatch sizeiterationsetmIoUmIoU(ms+flip)modelConfig
SETR-Naive512x51216160kVal48.0648.80google driveconfig
SETR-MLA512x5128160kval47.7950.03google driveconfig
SETR-MLA512x51216160kval48.6450.28google driveconfig
SETR-MLA-Deit512x51216160kval46.1547.71google driveconfig
SETR-PUP512x51216160kval48.6250.09google driveconfig
SETR-PUP-Deit512x51216160kval46.3447.30google driveconfig

Pascal Context

MethodCrop SizeBatch sizeiterationsetmIoUmIoU(ms+flip)modelConfig
SETR-Naive480x4801680kval52.8953.61google driveconfig
SETR-MLA480x480880kval54.3955.39google driveconfig
SETR-MLA480x4801680kval55.0155.83google driveconfig
SETR-MLA-DeiT480x4801680kval52.9153.74google driveconfig
SETR-PUP480x4801680kval54.3755.27google driveconfig
SETR-PUP-DeiT480x4801680kval52.0052.50google driveconfig

HLG

ImageNet-1K

HLG classification is under folder hlg-classification/.

ModelResolutionParamsFLOPsTop-1 %ConfigPretrained Model
HLG-Tiny22411M2.1G81.1hlg_tiny_224.yamlgoogle drive
HLG-Small22424M4.7G82.3hlg_small_224.yamlgoogle drive
HLG-Medium22443M9.0G83.6hlg_medium_224.yamlgoogle drive
HLG-Large22484M15.9G84.1hlg_large_224.yamlgoogle drive

Cityscapes

HLG segmentation shares the same folder as SETR.

MethodCrop SizeBatch sizeiterationsetmIoUconfig
SETR-HLG-Small768x7681640kval81.8config
SETR-HLG-Medium768x7681640kval82.5config
SETR-HLG-Large768x7681640kval82.9config

ADE20K

HLG segmentation shares the same folder as SETR.

MethodCrop SizeBatch sizeiterationsetmIoUConfig
SETR-HLG-Small512x51216160kVal47.3config
SETR-HLG-Medium512x51216160kVal49.3config
SETR-HLG-Large512x51216160kVal49.8config

COCO

HLG detection is under folder hlg-detection/.

BackboneLr schdbox APConfig
SETR-HLG-Small1x44.4config
SETR-HLG-Medium1x46.6config
SETR-HLG-Large1x47.7config

Installation

Our project is developed based on MMsegmentation. Please follow the official MMsegmentation INSTALL.md and getting_started.md for installation and dataset preparation.

🔥🔥 SETR is on MMsegmentation. 🔥🔥

A from-scratch setup script

Linux

Here is a full script for setting up SETR with conda and link the dataset path (supposing that your dataset path is $DATA_ROOT).

conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab

conda install pytorch=1.6.0 torchvision cudatoolkit=10.1 -c pytorch -y
pip install mmcv-full==1.2.2 -f https://download.openmmlab.com/mmcv/dist/cu101/torch1.6.0/index.html
git clone https://github.com/fudan-zvg/SETR.git
cd SETR
pip install -e .  # or "python setup.py develop"
pip install -r requirements/optional.txt

mkdir data
ln -s $DATA_ROOT data

Windows(Experimental)

Here is a full script for setting up SETR with conda and link the dataset path (supposing that your dataset path is %DATA_ROOT%. Notice: It must be an absolute path).

conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab

conda install pytorch=1.6.0 torchvision cudatoolkit=10.1 -c pytorch
set PATH=full\path\to\your\cpp\compiler;%PATH%
pip install mmcv

git clone https://github.com/fudan-zvg/SETR.git
cd SETR
pip install -e .  # or "python setup.py develop"
pip install -r requirements/optional.txt

mklink /D data %DATA_ROOT%

Get Started

Pre-trained model

The pre-trained model will be automatically downloaded and placed in a suitable location when you run the training command. If you are unable to download due to network reasons, you can download the pre-trained model from here (ViT) and here (DeiT).

Train

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} 
# For example, train a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8

Single-scale testing

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Multi-scale testing

Use the config file ending in _MS.py in configs/SETR.

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8_MS.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Generate the png files to be submit to the official evaluation server

Please see getting_started.md for the more basic usage of training and testing.

Reference

@inproceedings{SETR,
    title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers}, 
    author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip H.S. and Zhang, Li},
    booktitle={CVPR},
    year={2021}
}
@article{SETR-HLG,
  title={Vision transformers: From semantic segmentation to dense prediction},
  author={Zhang, Li and Lu, Jiachen and Zheng, Sixiao and Zhao, Xinxuan and Zhu, Xiatian and Fu, Yanwei and Xiang, Tao and Feng, Jianfeng and Torr, Philip HS},
  journal={International Journal of Computer Vision},
  pages={1--21},
  year={2024},
  publisher={Springer}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo: MMsegmentation pytorch-image-models