Awesome
MasQCLIP for Open-Vocabulary Universal Image Segmentation
Xin Xu*, Tianyi Xiong*, Zheng Ding and Zhuowen Tu (*Equal Contribution)
This is the repository for MasQCLIP for Open-Vocabulary Universal Image Segmentation, published at ICCV 2023.
[Project Page
] [Paper
]
Dataset
Please refer to dataset preparation.
Training and Testing
Please refer to installation instructions for environment setup.
Base-novel Setting
In the base-novel setting, the model is trained on the base classes and tested on novel classes. To train a model under base-novel setting (on COCO-instance), run
# Progressive Distillation
python train_net.py --num-gpus 8 --config-file configs/base-novel/coco-instance/teacher_R50_100k_base48.yaml OUTPUT_DIR "${work_dir}/teacher"
python train_net.py --num-gpus 4 --config-file configs/base-novel/coco-instance/student_R50_30k_base48.yaml OUTPUT_DIR "${work_dir}/student" MODEL.WEIGHTS "${work_dir}/teacher/model_final.pth"
# MasQ-Tuning
python train_net.py --num-gpus 4 --config-file configs/base-novel/coco-instance/masqclip_R50_bs4_10k_base48.yaml OUTPUT_DIR "${work_dir}/masq" MODEL.WEIGHTS "${work_dir}/student/model_final.pth"
To evaluate a model's performance, use
python train_net.py --eval-only --num-gpus 4 --config-file configs/base-novel/coco-instance/masqclip_R50_bs4_10k_instance65.yaml OUTPUT_DIR "${work_dir}/generalized" MODEL.WEIGHTS "${work_dir}/masq/model_final.pth"
Cross-dataset Setting
In the cross-dataset setting, the model is trained on one dataset e.g., COCO, and tested on another dataset e.g., ADE20K. To train a model under cross-dataset setting (on COCO-panoptic), run
# Progressive Distillation
python train_net.py --num-gpus 8 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/teacher_R50_200k.yaml OUTPUT_DIR "${work_dir}/train_coco/teacher"
python train_net.py --num-gpus 4 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/student_R50_30k.yaml OUTPUT_DIR "${work_dir}/train_coco/student" MODEL.WEIGHTS "${work_dir}/train_coco/teacher/model_final.pth"
# MasQ-Tuning
python train_net.py --num-gpus 4 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/masqclip_R50_bs4_10k.yaml OUTPUT_DIR "${work_dir}/train_coco/masq" MODEL.WEIGHTS "${work_dir}/train_coco/student/model_final.pth"
To evaluate a model's performance, use
model_path="${work_dir}/train_coco/masq/model_final.pth"
# For example, to evaluate on ADE20K-150, use
python train_net.py --eval-only --num-gpus 4 --config-file configs/cross-dataset/test/ade20k-150/panoptic-segmentation/masqclip_R50_bs4_10k.yaml OUTPUT_DIR "${work_dir}/test_ade20k_150" MODEL.WEIGHTS $model_path
Pre-trained Models
Pre-trained models can be found in this Google Drive link.
Acknowledgment
The code is based on MaskCLIP, Mask2Former and CLIP.
Citation
Please consider citing MasQCLIP and MaskCLIP if you find the codes useful:
@inproceedings{xu2023masqclip,
author = {Xu, Xin and Xiong, Tianyi and Ding, Zheng and Tu, Zhuowen},
title = {MasQCLIP for Open-Vocabulary Universal Image Segmentation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {887-898},
}
@inproceedings{ding2023maskclip,
author = {Ding, Zheng and Wang, Jieke and Tu, Zhuowen},
title = {Open-Vocabulary Universal Image Segmentation with MaskCLIP},
booktitle = {International Conference on Machine Learning},
year = {2023},
}