Awesome
[CVPR2024] Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
We release our code and trained models for our CVPR2024 paper Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Getting started
Environment setup
First, clone this repo:
git clone https://github.com/slonetime/EBSeg.git
Then, create a new conda env and install required packeges:
cd EBSeg
conda create --name ebseg python=3.9
conda activate ebseg
pip install -r requirements.txt
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
At last, install the MultiScaleDeformableAttention in Mask2former:
cd ebseg/model/mask2former/modeling/pixel_decoder/ops/
sh make.sh
Data preparation
We follow the dataset preparation process in SAN, so please follow the instructions in https://github.com/MendelXu/SAN?tab=readme-ov-file#data-preparation.
Training
First, change the config_file path, dataset_dir path and ourput_dir path in train.sh. Then, you can train an EBSeg model with the following command:
bash train.sh
Inference with our trained model
Download our trained models from the url links in the followding table(with mIoU metric):
Model | A-847 | PC-459 | A-150 | PC-59 | VOC |
---|---|---|---|---|---|
EBSeg-B | 11.1 | 17.3 | 30.0 | 56.7 | 94.6 |
EBSeg-L | 13.7 | 21.0 | 32.8 | 60.2 | 96.4 |
Like training, you should change the config_file path, dataset_dir path, checkpoint path and ourput_dir path in test.sh. Then, test a EBSeg model by:
bash test.sh
Acknowledgments
Our code are based on SAN, CLIP, CLIP Surgery, Mask2former and ODISE.
We thanks them for their excellent works!