

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling


This repository contains the implementation of Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling.

In this work, we address open-vocabulary instance segmentation, which learn to segment novel objects without any mask annotation during training by generating pseudo masks based on captioned images.



Our code is based upon OVR, which is built upon mask-rcnn benchmark. To setup the code, please follow the instruction within INSTALL.md.


To download the datasets, please follow the below instructions.

For more information the data directory is structured, please refer to maskrcnn_benchmark/config/paths_catalog.py.


python ./preprocess/coco/construct_coco_json.py

Open Images & Conceptual Captions

Annotations for Open Images

cd ./preprocess/openimages/openimages2coco
python convert_annotations.py -p ../../../datasets/openimages/ --version challenge_2019 --task mask --subsets train
python convert_annotations.py -p ../../../datasets/openimages/ --version challenge_2019 --task mask --subsets val
python ./preprocess/openimages/construct_openimages_json.py

Annotations for Conceptual Captions

Coming Soon!


To reproduce the main experiments in the paper, we provide the script to train the teacher and the student models on both MS-COCO and Open Images & Conceptual Captions below. Please notice that the teacher must be trained first in order to produce pseudo labels/masks to train the student models.


python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/coco_cap_det/mmss.yaml --skip-test OUTPUT_DIR ./model_weights/model_pretrained.pth
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/coco_cap_det/zeroshot_mask.yaml OUTPUT_DIR ./checkpoint/mscoco_teacher/ MODEL.WEIGHT ./model_weights/model_pretrained.pth
python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/coco_cap_det/student_teacher_mask_rcnn_uncertainty.yaml OUTPUT_DIR ./checkpoint/mscoco_student/ MODEL.WEIGHT ./checkpoint/mscoco_teacher/model_final.pth
python -m torch.distributed.launch --nproc_per_node=8 tools/test_net.py --config-file configs/coco_cap_det/student_teacher_mask_rcnn_uncertainty.yaml OUTPUT_DIR ./results/mscoco_student MODEL.WEIGHT ./pretrained_model/coco_student/model_final.pth

Open Images & Conceptual Captions

Coming Soon!
Coming Soon!
Coming Soon!
Coming Soon!

Pretrained Models

Conceptual Caps + Open Imagesmodelmodel


If this code is helpful for your research, we would appreciate if you cite the work:

  author = {D.~Huynh and J.~Kuen and Z.~Lin and J.~Gu and E.~Elhamifar},
  title = {Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling},
  journal = {{IEEE} Conference on Computer Vision and Pattern Recognition},
  year = {2022}}