Home

Awesome

This repository includes the code for end-to-end gLSTM and sentence-conditional semantic attention, as appeared in the paper "Watch What You Just Said: Image Captioning with Text-Conditional Attention". train_new.lua is the main file for e2e-gLSTM and train_sc.lua is the main file for sentence-conditional semantic attention. Here are the example commands:

th train_new.lua -cnn_model_resnet /path/to/your/resnet-200-model -language_eval 1 -finetune_cnn_after 100000 -max_iters 600000 -cnn_weight_decay 0.001 -cnn_learning_rate 0.00001 -learning_rate_decay_every 100000 -learning_rate_decay_start 100000
th train_sc.lua -start_from /path/to/your/e2eglstm-checkpoint -language_eval 1 -language_model 'misc_tc.LanguageModel_sc' -max_iters 200000

Note that if you transfer weights from vgg-16 or resnet-34, the -max-iters values could be smaller. The result table is shown below.

MethodsBleu@4METEORCIDEr
sc-vgg-1630.124.797.0
sc-resnet-3430.625.098.1
sc-resnet-20031.625.6101.2

The implementation is based on Neuraltalk2. Please follow the instructions on Neuraltalk2 to run the code. Contact me if you have any trouble running the code . Please cite the following paper if you are using the code.

@article{zhou2016image,
  title={Image Caption Generation with Text-Conditional Semantic Attention},
  author={Zhou, Luowei and Xu, Chenliang and Koch, Parker and Corso, Jason J},
  journal={arXiv preprint arXiv:1606.04621},
  year={2016}
}