Home

Awesome

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]

Framework: PyTorch

[Project Page] [arXiv] [PDF] [Suppli] [Slides] [BibTeX]

<p align="center"> <img src="figs/ovis-gif5.gif" width="800"/> </p>

Contributions

Environment

UBUNTU="18.04"
CUDA="11.0"
CUDNN="8"

Pseudo-mask Generator Pipeline

<p align="center"> <img src="figs/ovis.png" width="800"/> </p>

Installation

conda create --name pseduo_mask_gen

conda activate pseduo_mask_gen

bash pseduo_mask_gen.sh

Preparation

Generate Pseudo-mask

python pseudo_mask_generator.py
python prepare_coco_dataset.py
# pip install git+https://github.com/openai/CLIP.git

python prepare_clip_embedding_for_open_vocab.py
python visualize_coco_style_dataset.py

Pseudo-mask Training Pipeline

Installation

conda create --name maskfree_ovis
conda activate maskfree_ovis
cd $INSTALL_DIR
bash ovis.sh

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

cd ../
cuda_dir="maskrcnn_benchmark/csrc/cuda"
perl -i -pe 's/AT_CHECK/TORCH_CHECK/' $cuda_dir/deform_pool_cuda.cu $cuda_dir/deform_conv_cuda.cu
python setup.py build develop

Data Preparation

Pretrain with Pseudo-Labels

python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py  --distributed \
--config-file configs/pretrain_pseduo_mask.yaml OUTPUT_DIR $OUTPUT_DIR

Finetune

python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py  --distributed \
--config-file configs/finetune.yaml MODEL.WEIGHT $PATH_TO_PRETRAIN_MODEL  OUTPUT_DIR $OUTPUT_DIR

Inference

'Coming Soon...!!!'

Citation

If you found Mask-free OVIS useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@inproceedings{vs2023mask,
  title={Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations},
  author={VS, Vibashan and Yu, Ning and Xing, Chen and Qin, Can and Gao, Mingfei and Niebles, Juan Carlos and Patel, Vishal M and Xu, Ran},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={23539--23549},
  year={2023}
}

Acknowledgement

The codebase is build on top of PB-OVD, CMPL and Wetectron.