Awesome

MasQCLIP for Open-Vocabulary Universal Image Segmentation

Xin Xu*, Tianyi Xiong*, Zheng Ding and Zhuowen Tu (*Equal Contribution)

This is the repository for MasQCLIP for Open-Vocabulary Universal Image Segmentation, published at ICCV 2023.

[Project Page] [Paper]

Dataset

Please refer to dataset preparation.

Training and Testing

Please refer to installation instructions for environment setup.

Base-novel Setting

In the base-novel setting, the model is trained on the base classes and tested on novel classes. To train a model under base-novel setting (on COCO-instance), run

# Progressive Distillation
python train_net.py --num-gpus 8 --config-file configs/base-novel/coco-instance/teacher_R50_100k_base48.yaml OUTPUT_DIR "${work_dir}/teacher"

python train_net.py --num-gpus 4 --config-file configs/base-novel/coco-instance/student_R50_30k_base48.yaml OUTPUT_DIR "${work_dir}/student" MODEL.WEIGHTS "${work_dir}/teacher/model_final.pth"

# MasQ-Tuning
python train_net.py --num-gpus 4 --config-file configs/base-novel/coco-instance/masqclip_R50_bs4_10k_base48.yaml OUTPUT_DIR "${work_dir}/masq" MODEL.WEIGHTS "${work_dir}/student/model_final.pth"

To evaluate a model's performance, use

python train_net.py --eval-only --num-gpus 4 --config-file configs/base-novel/coco-instance/masqclip_R50_bs4_10k_instance65.yaml OUTPUT_DIR "${work_dir}/generalized" MODEL.WEIGHTS "${work_dir}/masq/model_final.pth"

Cross-dataset Setting

In the cross-dataset setting, the model is trained on one dataset e.g., COCO, and tested on another dataset e.g., ADE20K. To train a model under cross-dataset setting (on COCO-panoptic), run

# Progressive Distillation
python train_net.py --num-gpus 8 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/teacher_R50_200k.yaml OUTPUT_DIR "${work_dir}/train_coco/teacher"

python train_net.py --num-gpus 4 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/student_R50_30k.yaml OUTPUT_DIR "${work_dir}/train_coco/student" MODEL.WEIGHTS "${work_dir}/train_coco/teacher/model_final.pth"

# MasQ-Tuning
python train_net.py --num-gpus 4 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/masqclip_R50_bs4_10k.yaml OUTPUT_DIR "${work_dir}/train_coco/masq" MODEL.WEIGHTS "${work_dir}/train_coco/student/model_final.pth"

To evaluate a model's performance, use

model_path="${work_dir}/train_coco/masq/model_final.pth"

# For example, to evaluate on ADE20K-150, use
python train_net.py --eval-only --num-gpus 4 --config-file configs/cross-dataset/test/ade20k-150/panoptic-segmentation/masqclip_R50_bs4_10k.yaml OUTPUT_DIR "${work_dir}/test_ade20k_150" MODEL.WEIGHTS $model_path

Pre-trained Models

Pre-trained models can be found in this Google Drive link.

Acknowledgment

The code is based on MaskCLIP, Mask2Former and CLIP.

Citation

Please consider citing MasQCLIP and MaskCLIP if you find the codes useful:

@inproceedings{xu2023masqclip,
    author    = {Xu, Xin and Xiong, Tianyi and Ding, Zheng and Tu, Zhuowen},
    title     = {MasQCLIP for Open-Vocabulary Universal Image Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {887-898},
}

@inproceedings{ding2023maskclip,
    author    = {Ding, Zheng and Wang, Jieke and Tu, Zhuowen},
    title     = {Open-Vocabulary Universal Image Segmentation with MaskCLIP},
    booktitle = {International Conference on Machine Learning},
    year      = {2023},
}