Home

Awesome

Show, Infer and Tell: Contextual Inference for Creative Captioning

Best Student Paper Award Honourable Mention (Top 0.5%) in 30<sup>th</sup> British Machine Vision Conference 2019

Code for the paper, "Show, Infer and Tell: Contextual Inference for Creative Captioning". Paper link: https://bmvc2019.org/wp-content/uploads/papers/0655-paper.pdf

Code supports:

Requirements

Download COCO captions and preprocess them

Extract dataset_coco.json from the zip file included in this repository and copy it in to data/. This file provides preprocessed captions and also standard train-val-test splits.

The code is based on ruotianluo and neuratalk2. Please refer them for more details on setup.

Notes about Training

Final model with LSTM size 2048 units is over 2 GB. With a batch size of 10 it take 3GB of GPU space.

Evaluate on Karpathy's test split and COCO split

Code includes COCO split option and evaluation script for testing on COCO server.

Reference

If you use the code, please consider citing:

@inproceedings{DBLP:conf/bmvc/KhareH19,
  author    = {Ankit Khare and
               Manfred Huber},
  title     = {Show, Infer and Tell: Contextual Inference for Creative Captioning},
  booktitle = {30th British Machine Vision Conference 2019, {BMVC} 2019, Cardiff,
               UK, September 9-12, 2019},
  pages     = {20},
  publisher = {BMVA Press},
  year      = {2019},
  url       = {https://bmvc2019.org/wp-content/uploads/papers/0655-paper.pdf},
  timestamp = {Tue, 21 Apr 2020 23:13:25 +0200},
  biburl    = {https://dblp.org/rec/conf/bmvc/KhareH19.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Acknowledgements

Thanks to google cloud for giving free trial credits, the original neuraltalk2, rluo and awesome PyTorch team.