Home

Awesome

Rewrite Caption Semantics: Bridging Semantic Gaps for Language Supervised Semantic Segmentation

This is the official repository of the following paper:

Rewrite Caption Semantics: Bridging Semantic Gaps for Language Supervised Semantic Segmentation<br> NeurIPS 2023<br> Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Ling Shao, Shijian Lu<br>

Updates

Environmental Setup

conda create -n rewrite python=3.7 -y
conda activate rewrite
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install mmcv-full==1.3.14 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html
pip install -r requirements.txt
git clone https://github.com/ptrblck/apex.git
cd apex & pip install -v --no-cache-dir ./

Run

Curation

CONFIG='configs/train/cocu_clip-vit-b-16_8_c3_30e.yml'
DATA='c3'
MODEL='clip-vit-b-16'
Turn a set of image-caption pairs to CLIP embeddings.
bash scripts/inference.sh --data ${DATA} --model ${MODEL}
Take CLIP embeddings and make a search index out of it.
bash scripts/index.sh --data ${DATA} --model ${MODEL}
Rewrite semantics of image captions.
python rewrite/curation.py --data ${DATA} --model ${MODEL}

Pre-train

./tools/dist_launch.sh main_group_vit.py ${CONFIG} 4

Citation

Please consider citing our paper if you find our work useful.

@inproceedings{xing2023rewrite,
    title={Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation}, 
    author={Yun Xing and Jian Kang and Aoran Xiao and Jiahao Nie and Shao Ling and Shijian Lu},
    booktitle={Advances in Neural Information Processing Systems},
    year={2023},
}

Acknowledgement

The repo is built on GroupViT and clip-retrieval.