Home

Awesome

[CVPR2024] Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

We release our code and trained models for our CVPR2024 paper Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Getting started

Environment setup

First, clone this repo:

git clone https://github.com/slonetime/EBSeg.git

Then, create a new conda env and install required packeges:

cd EBSeg
conda create --name ebseg python=3.9
conda activate ebseg
pip install -r requirements.txt
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

At last, install the MultiScaleDeformableAttention in Mask2former:

cd ebseg/model/mask2former/modeling/pixel_decoder/ops/
sh make.sh 

Data preparation

We follow the dataset preparation process in SAN, so please follow the instructions in https://github.com/MendelXu/SAN?tab=readme-ov-file#data-preparation.

Training

First, change the config_file path, dataset_dir path and ourput_dir path in train.sh. Then, you can train an EBSeg model with the following command:

bash train.sh

Inference with our trained model

Download our trained models from the url links in the followding table(with mIoU metric):

ModelA-847PC-459A-150PC-59VOC
EBSeg-B11.117.330.056.794.6
EBSeg-L13.721.032.860.296.4

Like training, you should change the config_file path, dataset_dir path, checkpoint path and ourput_dir path in test.sh. Then, test a EBSeg model by:

bash test.sh

Acknowledgments

Our code are based on SAN, CLIP, CLIP Surgery, Mask2former and ODISE.

We thanks them for their excellent works!