Home

Awesome

DeeCap

This repository includes the reference code for paper:

Dynamic Early Exit for Efficient Image Captioning

Data

To run the code, annotations and images for the COCO dataset are needed. Please download the zip files including the images (train2014.zip, val2014.zip), the zip file containing the annotations (annotations_trainval2014.zip) and extract them. These paths will be set as arguments later. Our code supports the image features extracted from conventional Faster-RCNN or CLIP model.

Training Procedure

Run python train_deecap.py using the following arguments:

ArgumentPossible values
--exp_nameExperiment name (default: deecap)
--train_data_pathPath to the training dataset
--features_pathPath to detection features file (optional)
--annotation_folderPath to folder with annotations (optional)
--tokenizer_pathPath to the tokenizer
--out_dirPath to the saved checkpoint
--batch_sizeBatch size (default: 10)
--lrLearning rate (default: 1e-4)

Evaluation

To reproduce the results reported in our paper, download the checkpoint model file and place it in the ckpt folder.

Run python test.py using the following arguments:

ArgumentPossible values
--batch_sizeBatch size (default: 10)
--features_pathPath to detection features file
--annotation_folderPath to folder with COCO annotations

Acknowledgment

This repository refers to Transformer Image Captioning and huggingface DeeBERT.