Awesome
Referring Image Segmentation Using Text Supervision
Official PyTorch implementation of TRIS, from the following paper:
Referring Image Segmentation Using Text Supervision. ICCV 2023.
Fang Liu*, Yuhao Liu*, Yuqiu Kong, Ke Xu, Lihe Zhang, Baocai Yin, Gerhard Hancke, Rynson Lau
<p align="left"> <img src="figs/pipeline.png" class="center"> </p>
Environment
We recommend running the code using <b>Pytorch 1.13.1</b> or higher version.
<!-- ```bash conda env create -f environment.yml ``` -->Dataset
RefCOCO/+/g
├── data/
| ├── train2014
| ├── refer
| | ├── refcocog
| | | ├── instances.json
| | | ├── refs(google).p
| | | ├── refs(umd).p
| | ├── refcoco
ReferIt
├── data/
| ├── referit
| | ├── annotations
| | | ├── train.pickle
| | | ├── test.pickle
| | ├── images
| | ├── masks
If you want to generate referit annotations by yourself, refer to MG for more details.
Evaluation
Note that we use <b>mIoU</b> to evaluate the accuracy of the generated masks.
- Create the
./weights
directory
mkdir ./weights
- Download model weights using github links below and put them in
./weights
.
ReferIt | RefCOCO | RefCOCO+ | G-Ref (Google) | G-Ref (UMD) | |
---|---|---|---|---|---|
Step-1 | weight | weight | weight | weight | weight |
Step-2 | weight | weight | weight | weight | weight |
- Shell for
G-Ref(UMD)
evaluation. Replacerefcocog
withrefcoco
, andumd
withunc
for RefCOCO dataset evaluation.
bash scripts/validate_stage1.sh
<!-- ```bash
python validate.py --batch_size 1 --size 320 --dataset refcocog --splitBy umd --test_split val --max_query_len 20 --dataset_root ./data --output weights/ --resume --pretrain stage1_refcocog_umd.pth --eval
```
For ReferIt dataset:
```bash
python validate_referit.py --batch_size 1 --size 320 --dataset referit --test_split test --backbone clip-RN50 --max_query_len 20 --dataset_root ./data/referit/ --output weights/ --resume --pretrain stage1_referit.pth --eval
``` -->
Demo
The output of the demo is saved in ./figs/
.
python demo.py --img figs/demo.png --text 'man on the right'
<p align="left">
<img src="figs/demo.png" style="width: 200px; height: auto; ">
<img src="figs/demo_(man on the right).png" style="width: 200px; height: auto;">
</p>
Training
- Train Step1 network on
Gref (UMD)
dataset.
bash scripts/train_stage1.sh
<!-- ```bash
python train_stage1.py --batch_size 48 --size 320 --dataset refcocog --splitBy umd --test_split val --epoch 15 --backbone clip-RN50 --max_query_len 20 --negative_samples 3 --output ./weights/refcocog_umd --board_folder ./output/board
``` -->
- Validate and generate response maps on the Gref (UMD)
train
set, based on the proposed PRMS strategy (--prms
). The response maps are saved in./output/refcocog_umd/cam/
indicated by the args--cam_save_dir
.
## path to save response maps and pseudo labels
dir=./output
python validate.py --batch_size 1 --size 320 \
--dataset refcocog --splitBy umd --test_split train \
--max_query_len 20 --output ./weights/ --resume \
--pretrain stage1_refcocog_umd.pth --cam_save_dir $dir/refcocog_umd/cam/ \
--name_save_dir $dir/refcocog_umd --eval --prms --save_cam
- Train IRNet and generate pseudo masks.
cd IRNet
dir=../output
## single GPU
CUDA_VISIBLE_DEVICES=0 python run_sample_refer.py \
--voc12_root ../../../work/datasets/train2014 \
--cam_out_dir $dir/refcocog_umd/cam \
--ir_label_out_dir $dir/refcocog_umd/ir_label \
--ins_seg_out_dir $dir/refcocog_umd/ins_seg \
--cam_eval_thres 0.15 \
--work_space output_refer/refcocog_umd \
--train_list $dir/refcocog_umd/refcocog_train_names.json \
--num_workers 2 \
--irn_batch_size 24 \
--cam_to_ir_label_pass True \
--train_irn_pass True \
--make_ins_seg_pass True \
## the code can run faster if more GPUs are available
#CUDA_VISIBLE_DEVICES=0,1,2,3 python run_sample_refer.py --cam_out_dir $dir/refcocog_umd/cam --ir_label_out_dir $dir/refcocog_umd/ir_label --ins_seg_out_dir $dir/refcocog_umd/ins_seg --train_list $dir/refcocog_umd/refcocog_train_names.json --cam_eval_thres 0.15 --work_space output_refer/refcocog_umd --num_workers 8 --irn_batch_size 96 --cam_to_ir_label_pass True --train_irn_pass True --make_ins_seg_pass True
- Train Step2 network using the generated pseudo masks in
output/refcocog_umd/ins_seg
indicated by the args--pseudo_path
.
cd ../
bash scripts/train_stage2.sh
## python train_stage2.py --batch_size 48 --size 320 --dataset refcocog --splitBy umd --test_split val --bert_tokenizer clip --backbone clip-RN50 --max_query_len 20 --epoch 15 --pseudo_path output/refcocog_umd/ins_seg --output ./weights/stage2/pseudo_refcocog_umd
Acknowledgement
This repository was based on LAVT, WWbL, CLIMS and IRNet.
Citation
If you find this repository helpful, please consider citing:
@inproceedings{liu2023referring,
title={Referring Image Segmentation Using Text Supervision},
author={Liu, Fang and Liu, Yuhao and Kong, Yuqiu and Xu, Ke and Zhang, Lihe and Yin, Baocai and Hancke, Gerhard and Lau, Rynson},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={22124--22134},
year={2023}
}
Contact
If you have any questions, please feel free to reach out at fawnliu2333@gmail.com
.