Home

Awesome

Minimalist and High-Performance Semantic Segmentation with Plain Vision Transformers

PWC

The official implementation of our paper.

Install and Usage

Please follow ViT-Adapter to prepare the environment and datasets.

Don't forget to convert the pre-trained weights of BEiT and BEiTv2 with beit2mmseg.py in tools.

Training

Please use the following commands. We fix random seeds to reduce randomness.

To train base models on ADE20K with 4gpus:

sh ./tools/dist_train.sh configs/ade/mask2former_beit_base_parallel_separate_slim_640_80k_ade20k_ss.py 4 --seed 0

To train large models on ADE20K with 8gpus:

sh ./tools/dist_train.sh configs/ade/mask2former_beit_large_parallel_separate_slim_640_80k_ade20k_ss.py 8 --seed 0

To train base models on Pascal Context with 4gpus:

sh ./tools/dist_train.sh configs/pascal/mask2former_beit_base_parallel_separate_slim_480_20k_pascal_ss.py 4 --seed 10

To train large models on Pascal Context with 4gpus:

sh ./tools/dist_train.sh configs/pascal/mask2former_beit_large_parallel_separate_slim_480_20k_pascal_ss.py 4 --seed 10

To train large models on COCO-Stuff 164K with 8gpus:

sh ./tools/dist_train.sh configs/coco164k/mask2former_beit_large_parallel_separate_slim_640_80k_coco164_ss.py 8 --seed 0

Pre-trained Models

Coming soon!

Acknowledgement

The code is largely based on ViT-Adapter and MMSegmentation.