Awesome
Explicit Image Caption Editing
This repository contains the datasets and reference code for the paper Explicit Image Caption Editing accpeted to ECCV 2022. Refer to our full paper for detailed intructions and analysis.
Overview
The Explicit Caption Editing (ECE) task is defined as follows. Given an image and a reference caption (Ref-Cap), ECE models aim to explicitly predict a sequence of edit operations (e.g., KEEP/DELETE/ADD) on the Ref-Cap, which can translate the Ref-Cap close to the ground-truth caption (GT-Cap). Typically, Ref-Cap is lightly misaligned with the image.
ECE datasets
The ECE datasets include the COCO-EE and Flickr30K-EE.
Specifically, the COCO-EE was built based on dataset MSCOCO, the Flikr30K-EE was built based on the dataset e-ViL and Flickr30K.
Each ECE instance contains three main information:
image_id
, the original image ID of the given image in the MSCOCO or Flikr30K-EE.Ref-Cap
, the reference caption which needs to be edited.GT-Cap
, the ground-truth caption of the given image and also the editing target.
Examples from COCO-EE and Flickr30K-EE
Statistical summary of the COCO-EE and Flickr30K-EE
COCO-EE | Flickr30K-EE | |||||
---|---|---|---|---|---|---|
Train | Dev | Test | Train | Dev | Test | |
#Editing instances | 97,567 | 5,628 | 5,366 | 108,238 | 4,898 | 4,910 |
#Images | 52,587 | 3,055 | 2,948 | 29,783 | 1,000 | 1,000 |
Mean Reference Caption Length | 10.3 | 10.2 | 10.1 | 7.3 | 7.4 | 7.4 |
Mean Ground-Truth Caption Length | 9.7 | 9.8 | 9.8 | 6.2 | 6.3 | 6.3 |
Mean Edit Distance | 10.9 | 11.0 | 10.9 | 8.8 | 8.8 | 8.9 |
Dataset Construction
The processed datasets have been placed in the dataset folder, they can also be directly download from here, including the COCO-EE and Flickr30K-EE in train
, dev
and test
splits.
Or, you can follow the instructions below to set up the environment and construct them:
COCO-EE Construction
- Setup coco-edit submodule and follow its instructions form this.
Flickr30K-EE Construction
- Setup environment
conda create -n flkree python=3.7 conda activate flkree conda install json conda install csv
- Prepare the esnlive data and the output folder
- Construct Flikr30K-EE
python construct_flickr30k_ee.py --split <split>
The ECE model: TIger
The code of our proposed ECE model TIger are now available here.