Home

Awesome

phrase_detection

A Tensorflow implementation of phrase detection framework by Bryan Plummer (bplum@bu.edu) as described in "Revisiting Image-Language Networks for Open-ended Phrase Detection". This repository is based on the tensorflow implementation of Faster R-CNN available here which in turn was based on the python Caffe implementation of Faster RCNN available here.

Prerequisites

Code was tested using python 2.7

Installation

  1. Clone the repository
git clone --recursive https://github.com/BryanPlummer/phrase_detection.git

We shall refer to the repo's root directory as $ROOTDIR

  1. Update your -arch in setup script to match your GPU
cd $ROOTDIR/lib
# Change the GPU architecture (-arch) if necessary
vim setup.py
  1. Download pretrained COCO models of the desired network which were released in this repo. Dy default, the code assumes they have been unpacked in a directory called pretrained. For example, after downloading the res101 coco models, you would use:
mkdir $ROOTDIR/pretrained
cd $ROOTDIR/pretrained
tar zxvf $DOWNLOADDIR/coco_900-1190k.tgz
  1. Download a pretrained word embedding. By default, the code assumes you have downloaded the HGLMM 6K-D vectors from here and placed the unziped file in the data directory. If you want to use a different word embedding, please update the pointer to the embedding file and its dimensions in lib/model/config.py. E.g.,
cd $ROOTDIR/data
unzip $DOWNLOADDIR/hglmm_6kd.zip
  1. Download and unpack the Flickr30K Entities and ReferIt Game datasets and build the modules and vocabularies from $ROOTDIR using,
./data/scripts/fetch_datasets.sh

Train your own model

Assuming you completed the Installation setup correctly, you should be able to train a model with,

./experiments/scripts/train_phrase_detector.sh [GPU_ID] [DATASET] [NET] [TAG]
# GPU_ID is the GPU you want to test on
# NET in {vgg16, res50, res101, res152} is the network arch to use
# DATASET {flickr, referit} is defined in train_phrase_detector.sh
# TAG is an experiment name
# Examples:
./experiments/scripts/train_phrase_detector.sh 0 flickr res101 default
./experiments/scripts/train_phrase_detector.sh 1 referit res101 default

This will train the model without the augmented phrases, to train with augmented phrases use:

./experiments/scripts/train_augmented_phrase_detector.sh [GPU_ID] [DATASET] [NET] [TAG]

Test and evaluate

You can test your models using,

./experiments/scripts/test_phrase_detector.sh [GPU_ID] [DATASET] [NET] [TAG]
# GPU_ID is the GPU you want to test on
# NET in {vgg16, res50, res101, res152} is the network arch to use
# DATASET {flickr, referit} is defined in test_phrase_detector.sh
# TAG is an experiment name
# Examples:
./experiments/scripts/test_phrase_detector.sh 0 flickr res101 default
./experiments/scripts/test_phrase_detector.sh 1 referit res101 default

Analogously, to test with augmented phrases use:

./experiments/scripts/test_augmented_phrase_detector.sh [GPU_ID] [DATASET] [NET] [TAG]

By default, trained networks are saved under:

output/[NET]/[DATASET]/{TAG}/

Citation

If you find our code useful please consider citing:

@article{plummerPhrasedetection,
  title={Revisiting Image-Language Networks for Open-ended Phrase Detection},
  author={Bryan A. Plummer and Kevin J. Shih and Yichen Li and Ke Xu and Svetlana Lazebnik and Stan Sclaroff and Kate Saenko},
  journal={arXiv:1811.07212},
  year={2018}
}