Awesome

Open-vocabulary Object Segmentation with Diffusion Models

This repository contains the official PyTorch implementation of grounded diffusion: https://arxiv.org/abs/2301.05221.

Requirements

A suitable conda environment named grounded-diffusion can be created and activated with:

conda env create -f environment.yaml
conda activate grounded-diffusion

Model Zoo

https://drive.google.com/drive/folders/1HlagN6jVhmC_UbrOAy133LkN4Qgf2Scv?usp=sharing

Train

Before training, please download the checkpoint of the off-the-shelf detector into a folder called mmdetection/checkpoint/.

python train.py --class_split 1 --train_data random --save_name pascal_1_random

Inference

python test.py --sd_ckpt 'xxx/stable_diffusion.ckpt' \
--grounding_ckpt 'xxx/grounding_module.pth' \
--prompt "a photo of a lion on a mountain top at sunset" \
--category "lion"

Citation

If you use this code for your research or project, please cite:

@article{li2023grounded,
  title   = {Open-vocabulary Object Segmentation with Diffusion Models},
  author  = {Li, Ziyi and Zhou, Qinye and Zhang, Xiaoyun and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year    = {2023}
}

Acknowledgements

Many thanks to the code bases from Stable Diffusion, CLIP, taming-transformers.