Awesome
Recurrent Multimodal Interaction for Referring Image Segmentation
This repository contains code for Recurrent Multimodal Interaction for Referring Image Segmentation, ICCV 2017.
If you use the code, please cite
@inproceedings{liu2017recurrent,
title={Recurrent Multimodal Interaction for Referring Image Segmentation},
author={Liu, Chenxi and Lin, Zhe and Shen, Xiaohui and Yang, Jimei and Lu, Xin and Yuille, Alan},
booktitle={{ICCV}},
year={2017}
}
Setup
- Tensorflow 1.2.1
- Download or use symlink, such that the MS COCO images are under
data/coco/images/train2014/
- Download or use symlink, such that the ReferItGame data are under
data/referit/images
anddata/referit/mask
- Run
mkdir external
. Download, git clone, or use symlink, such that TF-resnet and TF-deeplab are underexternal
. Then strictly follow theExample Usage
section of their README - Download, git clone, or use symlink, such that refer is under
external
. Then strictly follow theSetup
andDownload
section of its README. Also put therefer
folder inPYTHONPATH
- Download, git clone, or use symlink, such that the MS COCO API is under
external
(i.e.external/coco/PythonAPI/pycocotools
) - pydensecrf
Data Preparation
python build_batches.py -d Gref -t train
python build_batches.py -d Gref -t val
python build_batches.py -d unc -t train
python build_batches.py -d unc -t val
python build_batches.py -d unc -t testA
python build_batches.py -d unc -t testB
python build_batches.py -d unc+ -t train
python build_batches.py -d unc+ -t val
python build_batches.py -d unc+ -t testA
python build_batches.py -d unc+ -t testB
python build_batches.py -d referit -t trainval
python build_batches.py -d referit -t test
Training and Testing
Specify several options/flags and then run main.py
:
-g
: Which GPU to use. Default is 0.-m
:train
ortest
. Training mode or testing mode.-w
:resnet
ordeeplab
. Specify pre-trained weights.-n
:LSTM
orRMI
. Model name.-d
:Gref
orunc
orunc+
orreferit
. Specify dataset.-t
:train
ortrainval
orval
ortest
ortestA
ortestB
. Which set to train/test on.-i
: Number of training iterations in training mode. The iteration number of a snapshot in testing mode.-s
: Used only in training mode. How many iterations per snapshot.-v
: Used only in testing mode. Whether to visualize the prediction. Default is False.-c
: Used only in testing mode. Whether to also apply Dense CRF. Default is False.
For example, to train the ResNet + LSTM model on Google-Ref using GPU 2, run
python main.py -m train -w resnet -n LSTM -d Gref -t train -g 2 -i 750000 -s 50000
To test the 650000-iteration snapshot of the DeepLab + RMI model on UNC testA set using GPU 1 (with visualization and Dense CRF), run
python main.py -m test -w deeplab -n RMI -d unc -t testA -g 1 -i 650000 -v -c
Miscellaneous
Code and data under util/
and data/referit/
are borrowed from text_objseg and slightly modified for compatibility with Tensorflow 1.2.1.
TODO
Add TensorBoard support.