Home

Awesome

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

This repository contains the code for the following papers:

Installation

  1. Install Python 3 (Anaconda recommended).
  2. Install Pytorch v1.0 or higher:
pip3 install torch torchvision
  1. Clone with Git, and then enter the root directory:
git clone --recursive https://github.com/daqingliu/CAVP.git && cd CAVP
  1. Install requirements for evaluation metrics:
apt install default-jdk
cd coco-caption && bash coco-caption/get_stanford_models.sh && cd ..

Download Data

  1. Download the image features (tsv extracted from bottom-up-attention) into data and unzip it.
  2. Convert tsv files to npz files which can be read in dataloader:
python misc/convert_tsv_to_npz.py
  1. Download coco annotations (h5 and json) into data.

Training and Evaluation

Just simply run:

bash run_train.sh
bash run_eval.sh

Citation

@article{zha2019context,
  title={Context-aware visual policy network for fine-grained image captioning},
  author={Zha, Zheng-Jun and Liu, Daqing and Zhang, Hanwang and Zhang, Yongdong and Wu, Feng},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  year={2019},
}

Acknowledgements

Part of this repository is built upon self-critical.pytorch.