Awesome

obj-centric-unsup-caption

Official Pytorch Implementation of the paper: Object-Centric Unsupervised Image Captioning (ECCV2022)

Requirements

Python 3
Pytorch 1.5+
coco-caption (Follow initialization steps in coco-caption/README.md)

Prepare data

All the data needed to run the code can be downloaded from Here.

Data needed for evaluation (using Localized Narratives captions on COCO): Download 'captions_LN_val2014_norepeat.json' and put it in 'path_to/coco-caption/annotations/'

Data needed for training: Download all the data from the above link and put them in './data'. OR do the following steps for preprocessing:

Download 'coco_Dataset.json' and 'Dataset_label.h5' (Dataset is GCC/SS) which contain the image dataset info and text dataset info.
Download 'box_only.zip' and 'feats_only.zip' which contain the features of COCO images. OR you can extract by yourself using Detectron2.
Download 'objects_vocab.txt' (the object category names) then use this script to generate the rest needed data (extract the visual object tokens for each image and construct object-to-image-mapping).
```
python preprocessing/construct_obj_to_img_map.py
```

Training and testing

bash run.sh

('run.sh' contains the command to train/test for both GCC/SS)

Acknowledgements

Some components of this repo were built from ImageCaptioning.pytorch and connect-caption-and-trace .