Home

Awesome

Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval.

This is the PyTorch code repository for the TMLR paper: Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval, by Maurits Bleeker<sup>1</sup>, Andrew Yates<sup>1</sup>, and Maarten de Rijke<sup>1</sup>.

If you have any questions, please open an issue or feel free to send an email to m.j.r.bleeker@uva.nl.

<sup>1</sup>University of Amsterdam, The Netherlands

Introduction

The code of this work is mainly based on the code repositories of PCME, VSRN, VSE++, and TERN.

Not all experiments we have run made it into the final version of the paper.

alt text

Environment, dependencies, and required packages

This project has been developed and tested in Python 3.6.13. All the libraries and Python dependencies used in this work are provided in src/requirements.txt.

pip install -r src/requirements.txt
python -c 'import nltk; nltk.download("punkt", download_dir="/downloads")'

Data and data processing

For this work, we use the Flickr30k (F30k) and MS-COCO Captions dataset (COCO), including the the CrissCrossed Captions annotations (CxC).

We store the entire dataset in a pickle file, to make training faster than loading the images one by one from disk. However, this requires a large RAM memory.

Please note that most pre-processing files use a fixed file path, you have to change this to the locations where you have stored the data/output files.

Please use the following URLs for the precomputed VSRN and TERN features.

Data processing

The caption annotations and train/val/test splits for the Flicker30k and MS-COCO dataset can be downloaded here. In the folder annotations/f30k we provide the JSON file for the Flicker30k dataset. The JSON file for MS-COCO is too big to upload to GitHub.

To generate the pickle file, run the function process_dataset(json_file, img_folder, dset_name, root, k=5) from the file data/data_processing.py.

Vocabulary classes

Each dataset has a different vocabulary. In the file data/vocab.py the Vocabulary class is provided, together with the build_vocab(captions, threshold=0) function. In the file notebooks/data_processing/vocab.ipynb, we build the vocab for each dataset. We provide a pickle file of the MS-COCO and F30k Vocab class used in this work in the folder vocab/, so you don't have to generate the vocab files yourself.

Adding the CxC annotations

To add the CxC annotations to the dataset(s), run notebooks/data_processing/CrissCrossed.ipynb. Please use the following URL for the CxC annotations: CrissCrossed Captions.

Targets generator

After generating the pickle file for the MS-COCO and Flicker30K dataset the latent targets should be added. To add the latent targets to the pickle files, run the notebooks/data_processing/caption_targets.ipynb file.

Additional data processing

In the folder notebooks/data_proccessing additional notebooks are provided for additional data processing for the CxC, VSRN, and TERN dataset

Training and reproducing experiments

To run the training script, simply run the one-line command:

python train.py --yaml_file configs/f30k/f30k_tmlr.yaml

for the F30k dataset, and

python train.py --yaml_file configs/coco/coco_tmlr.yaml

for the COCO dataset.

In the folder configs/{coco, f30k}/{coco, f30k}_tmlr.yaml the config.yaml files are provided to the base models for experiments 5.1-5.4.

To run the experiments for experiment 5.5 run either python train.py --yaml_file configs/coco/coco_vsrn.yaml or python train.py --yaml_file configs/f30k/f30k_vsrn.yaml to train with the VSRN image encoder, or python train.py --yaml_file configs/coco/coco_tern.yaml to train with TERN.

Training hyperparameters used in this work

To train other and different model configurations, we change the following training hyperparameters.

All the other hyperparameters can be found in configs/{coco, f30k}/{coco, f30k}_tmlr.yaml.

Job files

All experiments for the paper have been run on a SLURM compute cluster. The job and hyper-parameter files for each experiment are provided in the folder jobs/{coco, f30k}/{TERN, VSRN, tmlr}.

Evaluation

To run a trained model on the test set run

python evaluate.py --path_to_model <path to model>/model_best.pth

Run the model with the highest score on the validation set. To use the validation set add the --split val flag.

Paper

In the folder /paper/plots, the data and Jupyter notebook is provided to generate the plots in Section 5.3 of the paper.

Citation

If you use this code to produce results for your scientific publication, or if you share a copy or fork, please refer to our TMLR paper:

@article{
    bleeker2023reducing,
    title={Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval},
    author={Maurits Bleeker and Andrew Yates and Maarten de Rijke},
    journal={Transactions on Machine Learning Research},
    issn={2835-8856},
    year={2023},
    url={https://openreview.net/forum?id=T1XtOqrVKn},
    note={}
}