Awesome

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]

[Project Page] [arXiv] [PDF] [Suppli] [Slides] [BibTeX]

Contributions

Utilize a pre-trained Vision-Language model to identify and localize the object of interest using GradCAM.
Employ a weakly-supervised proposal generator to generate bounding box proposals and select the proposal with the highest overlap with the GradCAM map.
Crop the image based on the selected proposal and leverage the GradCAM map as a weak prompt to extract a mask using a weakly-supervised segmentation network.
Using these generated masks, train an instance segmentation model (Mask-RCNN) eliminating the need for human-provided box-level or pixel-level annotations.

Environment

UBUNTU="18.04"
CUDA="11.0"
CUDNN="8"

Pseudo-mask Generator Pipeline

Installation

conda create --name pseduo_mask_gen

conda activate pseduo_mask_gen

bash pseduo_mask_gen.sh

Preparation

Referring examples/README.md for data preparation

Generate Pseudo-mask

Get pseudo labels based on ALBEF and generated pseudo-mask are available here

python pseudo_mask_generator.py

Organize dataset in COCO format

python prepare_coco_dataset.py

Extract text embedding using CLIP

# pip install git+https://github.com/openai/CLIP.git

python prepare_clip_embedding_for_open_vocab.py

Check your final pseudo-mask by visualization

python visualize_coco_style_dataset.py

Pseudo-mask Training Pipeline

Installation

conda create --name maskfree_ovis
conda activate maskfree_ovis
cd $INSTALL_DIR
bash ovis.sh

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

cd ../
cuda_dir="maskrcnn_benchmark/csrc/cuda"
perl -i -pe 's/AT_CHECK/TORCH_CHECK/' $cuda_dir/deform_pool_cuda.cu $cuda_dir/deform_conv_cuda.cu
python setup.py build develop

Data Preparation

Follow steps in datasets/README.md for data preparation

Pretrain with Pseudo-Labels

python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py  --distributed \
--config-file configs/pretrain_pseduo_mask.yaml OUTPUT_DIR $OUTPUT_DIR

Finetune

python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py  --distributed \
--config-file configs/finetune.yaml MODEL.WEIGHT $PATH_TO_PRETRAIN_MODEL  OUTPUT_DIR $OUTPUT_DIR

Inference

'Coming Soon...!!!'

Citation

If you found Mask-free OVIS useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@inproceedings{vs2023mask,
  title={Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations},
  author={VS, Vibashan and Yu, Ning and Xing, Chen and Qin, Can and Gao, Mingfei and Niebles, Juan Carlos and Patel, Vishal M and Xu, Ran},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={23539--23549},
  year={2023}
}

Acknowledgement

The codebase is build on top of PB-OVD, CMPL and Wetectron.