Home

Awesome

CARIS: Context-Aware Referring Image Segmentation

This repository is for the ACM MM 2023 paper CARIS: Context-Aware Referring Image Segmentation.

<div align="center"> <img src="figures/network.png" width="600" /> </div>

Requirements

The code is verified with Python 3.8 and PyTorch 1.11. Other dependencies are listed in requirements.txt.

Datasets

Please follow the instruction in .refer to download annotations of RefCOCO/RefCOCO+/RefCOCOg. We provide the combined annotations as refcocom here.

Download images from COCO. Please use the first downloading link 2014 Train images [83K/13GB], and extract the downloaded train_2014.zip file.

Data paths should be as follows:

.{YOUR_REFER_PATH}
├── refcoco
├── refcoco+
├── refcocog
├── refcocom

.{YOUR_COCO_PATH}
├── train2014

Pretrained Models

Download pretrained Swin-B and BERT-B. Check models to get pretrained CARIS models.

Usage

Train

By default, we use fp16 training for efficiency. To train a model on refcoco with 2 GPUs, modify YOUR_COCO_PATH, YOUR_REFER_PATH, YOUR_MODEL_PATH, and YOUR_CODE_PATH in scripts/train_refcoco.sh then run:

sh scripts/train_refcoco.sh

You can change DATASET to refcoco+/refcocog/refcocom for training on different datasets. Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd or --splitBy google to specify the split.

Test

Single-GPU evaluation is supported. To evaluate a model on refcoco, modify the settings in scripts/test_refcoco.sh and run:

sh scripts/test_refcoco.sh

You can change DATASET and SPLIT to evaludate on different splits of each dataset. Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd or --splitBy google to specify the split. For the models trained on refcocom, you can directly evaluate them on the splits of refcoco/refcoco+/refcocog(umd).

References

This repo is mainly built based on LAVT and mmdetection. Thanks for their great work!

Citation

If you find our code useful, please consider to cite with:

@inproceedings{liu2023caris,
  title={CARIS: Context-Aware Referring Image Segmentation},
  author={Liu, Sun-Ao and Zhang, Yiheng and Qiu, Zhaofan and Xie, Hongtao and Zhang, Yongdong and Yao, Ting},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  year={2023}
}