Home

Awesome

Attribute Phrases

This is the dataset and the training code with Tensorflow used in the paper:

Jong-Chyi Su*, Chenyun Wu*, Huaizu Jiang, Subhransu Maji, "Reasoning about Fine-grained Attribute Phrases using Reference Games", International Conference on Computer Vision (ICCV), 2017

@inproceedings{su2017reasoning,
    Author = {Jong-Chyi Su and Chenyun Wu and Huaizu Jiang and Subhransu Maji},
    Title = {Reasoning about Fine-grained Attribute Phrases using Reference Games},
    Booktitle = {International Conference on Computer Vision (ICCV)},
    Year = {2017}
}

[Project page] [Paper]

Dataset

Each pair has 1 pair of images and 5 pairs of corresponding attribute phrases

Image 1Image 2
<img src="dataset/images/1532637.jpg" height = "150"><img src="dataset/images/1704089.jpg" height = "150">
commercial planevsprivate plane
large planevssmall plane
white and greyvswhite with blue and red stripes
twin enginesvssingle engine
more windows on bodyvsless windows on body

Stats about the dataset

Requirements

Download Dataset

Download ImageNet Pre-trained Model

Add pretrained model (e.g. vgg_16.ckpt) in models/checkpoints/

Extract image feature to numpy file to accelerate training

Go to utils/ and run: python get_feature.py --dataset train the numpy file will be saved in img_feat/vgg_16/train.npy

Train Listener Model

Step 1 fix image feature Step 2 finetune image feature

SL (Simple Listener)

  1. python train_listener.py --mode train --log_dir result/SL --pairwise 0 --train_img_model 0 --max_steps 2000 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/SL --pairwise 0 --train_img_model 1 --max_steps 7500 --load_model_path model-fixed-2000 --learn_rate 0.00001

SLr (Simple Listener trained w/o contrastive data)

  1. python train_listener.py --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 0 --max_steps 5000 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 1 --max_steps 10000 --load_model_path model-fixed-5000 --learn_rate 0.00001

DL (Discerning Listener)

  1. python train_listener.py --mode train --log_dir result/DL --pairwise 1 --train_img_model 0 --max_steps 2000 --max_sent_length 17 --batch_size 128
  2. python train_listener.py --mode train --log_dir result/DL --pairwise 1 --train_img_model 1 --max_steps 7000 --load_model_path model-fixed-2000 --max_sent_length 17 --learn_rate 0.00001

Evaluate Listener Model

SL

  1. python train_listener.py --mode eval --log_dir result/SL --pairwise 0 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/SL --pairwise 0 --train_img_model 1 --load_model_path model-finetune-7500 --dataset val

SLr

  1. python train_listener.py --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 0 --load_model_path model-fixed-5000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 1 --load_model_path model-finetune-10000 --dataset val

DL

  1. python train_listener.py --mode eval --log_dir result/DL --pairwise 1 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
  2. python train_listener.py --mode eval --log_dir result/DL --pairwise 1 --train_img_model 1 --load_model_path model-finetune-7000 --dataset val

Train Speaker Model

Use Speaker to Generate Attribute Phrases

Discerning Speaker Model

Here we use the listener model to re-rank attribute phrases generated by speaker model. To run this step, you need to have a listenter model, and generated phrases from a speaker model.

Generate Set-wise Attribute Phrases

Authors

Please contact jcsu@cs.umass.edu if you have any question.