Awesome
[Deprecated] Image Caption Generator
Notice: This project uses an older version of TensorFlow, and is no longer supported. Please consider using other latest alternatives.
A Neural Network based generative model for captioning images.
Checkout the android app made using this image-captioning-model: Cam2Caption and the associated paper.
Work in Progress
Updates(Jan 14, 2018):
- Some Code Refactoring.
- Added MSCOCO dataset support.
Updates(Mar 12, 2017):
- Added Dropout Layer for LSTM, Xavier Glorot Initializer for Weights
- Significant Optimizations for Caption Generation i.e Decode Routine, computation time reduce from 3 seconds to 0.2 seconds
- Functionality to Freeze Graphs and Merge them.
- Direct Serving(Dual Graph and Single Graph) Routines in /util/
- Explored and chose the fastest and most efficient Image Preprocessing Method.
- Ported code to TensorFlow r1.0
Updates(Feb 27, 2017):
- Added BLEU evaluation metric and batch processing of images to produce batches of captions.
Updates(Feb 25, 2017):
- Added optimizations and one-time pre-processing of Flickr30K data
- Changed to a faster Image Preprocessing method using OpenCV
To-Do(Open for Contribution):
- FIFO-queues in training
- Attention-Model
- Trained Models for Distribution.
Pre-Requisites:
- Tensorflow r1.0
- NLTK
- pandas
- Download Flickr30K OR MSCOCO images and captions.
- Download Pre-Trained InceptionV4 Tensorflow graph from DeepDetect available here
Procedure to Train and Generate Captions:
- Clone the Repository to preserve Directory Structure
- For flickr30k put results_20130124.token and Flickr30K images in flickr30k-images folder OR For MSCOCO put captions_val2014.json and MSCOCO images in COCO-images folder .
- Put inception_v4.pb in ConvNets folder
- Generate features(features.npy) corresponding to the images in the dataset folder by running-
- For Flickr30K:
python convfeatures.py --data_path Dataset/flickr30k-images --inception_path ConvNets/inception_v4.pb
- For MSCOCO:
python convfeatures.py --data_path Dataset/COCO-images --inception_path ConvNets/inception_v4.pb
- For Flickr30K:
- To Train the model run-
- For Flickr30K:
python main.py --mode train --caption_path ./Dataset/results_20130124.token --feature_path ./Dataset/features.npy --resume
- For MSCOCO:
python main.py --mode train --caption_path ./Dataset/captions_val2014.json --feature_path ./Dataset/features.npy --data_is_coco --resume
- For Flickr30K:
- To Generate Captions for an Image run
python main.py --mode test --image_path VALID_PATH
- For usage as a python library see Demo.ipynb
(see python main.py -h
for more)
Miscellaneous Notes:
Freezing the encoder and decoder Graphs
- It's necessary to save both encoder and decoder graphs while running test. This is a one-time necessary run before freezing the encoder/decoder.
python main.py --mode test --image_path ANY_TEST_IMAGE.jpg/png --saveencoder --savedecoder
- In the project root directory use -
python utils/save_graph.py --mode encoder --model_folder model/Encoder/
additionally you may want to use--read_file
if you want to freeze the encoder for directly generating caption for an image file(path). Similarly, for decoder use -python utils/save_graph.py --mode decoder --model_folder model/Decoder/
, read_file argument is not necessary for the decoder. - To use frozen encoder and decoder models as dual blackbox Serve-DualProtoBuf.ipynb. Note: You must freeze encoder graph with --read_file to run this notebook
(see python utils/save_graph.py -h
for more)
Merging the encoder and decoder graphs for serving the model as a blackbox:
- It's necessary to freeze the encoder and decoder as mentioned above.
- In the project root directory run-
python utils/merge_graphs.py --encpb ./model/Trained_Graphs/encoder_frozen_model.pb --decpb ./model/Trained_Graphs/decoder_frozen_model.pb
additionally you may want to use--read_file
if you want to freeze the encoder for directly generating caption for an image file(path).
- To use merged encoder and decoder models as single frozen blackbox: Serve-SingleProtoBuf.ipynb. Note: You must freeze and merge encoder graph with --read_file to run this notebook
(see python utils/merge_graphs.py -h
for more)
Training Steps vs Loss Graph in Tensorboard:
tensorboard --logdir model/log_dir
- Navigate to
localhost:6006
Citation:
If you use our model or code in your research, please cite the paper:
@article{Mathur2017,
title={Camera2Caption: A Real-time Image Caption Generator},
author={Pranay Mathur and Aman Gill and Aayush Yadav and Anurag Mishra and Nand Kumar Bansode},
journal={IEEE Conference Publication},
year={2017}
}
Reference:
Show and Tell: A Neural Image Caption Generator
-Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
License:
Protected Under BSD-3 Clause License.
Some Examples: