Awesome
TIger
This repository contains the reference code for the ECE model TIger proposed in the paper Explicit Image Caption Editing accpeted to ECCV 2022. Refer to our full paper for detailed intructions and analysis. The dataset and more detailed task information are available in this ECE repository.
If you find this paper helps your research, please kindly consider citing our paper in your publications.
@inproceedings{wang2022explicit,
title={Explicit Image Caption Editing},
author={Wang, Zhen and Chen, Long and Ma, Wenbo and Han, Guangxing and Niu, Yulei and Shao, Jian and Xiao, Jun},
booktitle={Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXXVI},
pages={113--129},
year={2022}
}
Environment setup
Clone the repository and create the tiger
conda environment using the conda.yml
file:
conda env create -f conda.yml
conda activate tiger
Data preparation
COCO_EE and Flickr30K_EE
The processed datasets have been placed in the dataset folder, they can also be directly download from here, including the COCO-EE and Flickr30K-EE in train
, eval
and test
splits.
Visual Features
For visual token features, we used the bottom-up features (36 regions for each image) which are extracted by a pre-trained Faster R-CNN.
COCO-EE
Download the pre-computed features file trainval_36.zip (~24.5 GB), unzip the file and rename the 'tsv' file as 'coco_36.tsv', then place it under the datasets/bottom_up/COCO_36
folder, run the 'process_bottom_up_feature.py' to process the feature.
python process_bottom_up_feature.py
Flickr30K-EE
Download the pre-computed features file flickr30k_36.zip (~6.5 GB, code: xlmd), unzip them and place 'flickr30k_36.tsv' under the datasets/bottom_up/Flickr30K_36
folder, run the 'process_bottom_up_feature.py' to process the feature.
python process_bottom_up_feature.py
Evaluation
To reproduce the results in the paper, download the pretrained model file pretrained_tiger (~6 GB) and place them under the pretrained_models/COCOEE
and pretrained_models/Flickr30KEE
folder, respectively.
To reproduce the results of our model, run:
COCO-EE
python eval.py --from_pretrained_tagger_del pretrained_models/COCOEE/tagger_del.bin --from_pretrained_tagger_add pretrained_models/COCOEE/tagger_add.bin --from_pretrained_inserter pretrained_models/COCOEE/inserter.bin --tasks 1 --batch_size 128 --save_name test_coco_ee --edit_round 5
Flickr30K-EE
python eval.py --from_pretrained_tagger_del pretrained_models/Flickr30KEE/tagger_del.bin --from_pretrained_tagger_add pretrained_models/Flickr30KEE/tagger_add.bin --from_pretrained_inserter pretrained_models/Flickr30KEE/inserter.bin --tasks 4 --batch_size 128 --save_name test_flickr30k_ee --edit_round 5
Expected output
Under results/
, you may find the edited results of all experiments.
Training procedure
Download the pretrained weights of ViLBERT from here (~1 GB) and place it under the pretrained_models/ViLBERT-6-Layer
folder.
Run python train.py
using the following arguments for different submodules.
COCO-EE
Tagger_del
python train.py --tasks 1 --tagger_loss_ratio 1.5
Tagger_add
python train.py --tasks 2 --tagger_loss_ratio 1.5
Inserter
python train.py --tasks 3
Flickr30K-EE
Tagger_del
python train.py --tasks 4 --tagger_loss_ratio 1.5
Tagger_add
python train.py --tasks 5 --tagger_loss_ratio 1.5
Inserter
python train.py --tasks 6
Expected output
Under model_out/
, you may also find the trained model of all experiments.
Acknowledgment
Special thanks to the authors of ViLBERT and bottom-up-attention, and the datasets used in this research project.