Home

Awesome

Review Network for Caption Generation

Image Captioning on MSCOCO

You can use the code in this repo to genearte a MSCOCO evaluation server submission with CIDEr=0.96+ with just a few hours.

No fine-tuning required. No fancy tricks. Just train three end-to-end review networks and do an ensemble.

Below is a comparison with other state-of-the-art systems (with according published papers) on the MSCOCO evaluation server:

ModelBLEU-4METEORROUGE-LCIDErFine-tunedTask specific features
Attention0.5370.3220.6540.893NoNo
MS Research0.5670.3310.6620.925NoYes
Google NIC0.5870.3460.6820.946YesNo
Semantic Attention0.5990.3350.6820.958NoYes
Review Net0.5970.3470.6860.969NoNo

In the diretcory image_caption_online, you can use the code therein to reproduce our evaluation server results.

In the directory image_caption_offline, you can rerun experiments in our paper using offline evaluation.

Code Captioning

Predicting comments for a piece of source code is another interesting task. In the repo we also release a dataset with train/dev/test splits, along with the code of a review network.

Check out the directory code_caption.

Below is a comparison with baselines on the code captioning dataset:

ModelLLHCS-1CS-2CS-3CS-4CS-5
LSTM Language Model-5.340.23400.27630.30000.31530.3290
Encoder-Decoder-5.250.25350.29760.32010.33670.3507
Encoder-Decoder (Bidir)-5.190.26320.30680.32900.34420.3570
Attentive Encoder-Decoder (Bidir)-5.140.27160.31520.33640.35230.3651
Review Net-5.060.28890.33610.35790.37310.3840

References

This repo contains the code and data used in the following paper:

Review Networks for Caption Generation

Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen

NIPS 2016