Awesome
Adaptive Attention in PyTorch
PyTorch Implementation of Knowing When to Look: Adaptive Attention via a Visual Sentinal for Image Captioning Paper<br/> Original Torch Implementation by Lu. et al can be found here
<img width="802" alt="ss" src="https://user-images.githubusercontent.com/30661597/62519932-15e14700-b85f-11e9-88e8-9f3a36993723.png">Instructions
-
Download the COCO 2014 dataset from here. In particualr, you'll need the 2014 Training, Validation and Testing images, as well as the 2014 Train/Val annotations.
-
Download Karpathy's Train/Val/Test Split. You may download it from here.<br/>
-
If you want to do evaluation on COCO, make sure to download the COCO API from here if your on Linux or from here if your on Windows. Then download the COCO caption toolkit from here and re-name the folder to
cococaption
. (This also requires java. Simply dowload it from here if you don't have it).
Files
preprocess.py
Creates the WORDMAP.json
file and the .h5
files <br/>
dataset.py
Creates the custom dataset<br/>
util.py
Functions to be used throught the code<br/>
models.py
Defines the architectures<br/>
train_eval
For Training and Evaluation<br/>
run.ipynb
For Testing and Visualization<br/>
The folder caption data
includes data used along with images, mainly for evaluation purposes.
Testing
Place the test image in the folder 'test_imgs', and name it as test.jpg
, and then run the run.ipynb
jupyter notebook file to get the results.
Results
The file here contains the obtained evalaution scores on the validation split. The pretrained model is provided here as well (trained for 12 epochs). Some results from Karpathy's Split is shown below. The visual grounding probability (1-beta) is shown in green. <br/> <br/>
References
Code adopted from sgrvinod implementation of "Show, Attend and Tell"<br/>