Awesome

Globetrotter

Code from the paper Globetrotter: Connecting Languages by Connecting Images.

Website of the project in globetrotter.cs.columbia.edu.

If you use the code or the dataset, please consider citing the paper as:

@article{globetrotter,
  title={Globetrotter: Connecting Languages by Connecting Images},
  author={Sur\'is, D\'idac and Epstein, Dave and Vondrick, Carl},
  journal={Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

An example of command line execution to train the mdoel can be found in scripts/train_globetrotter.sh. To reproduce the numbers from the paper, please use the released pretrained models, and the scripts/test/*.sh scripts. In order to run those scripts, extract features by running scripts/extract_features/*.sh first. Modify the parameters in the bash files with the corresponding paths (dataset, extracted features)

Run python main.py --help for information on arguments.

Be sure to have the external libraries in requirements.txt installed.

Data

We collected the Globetrotter dataset for this project. It contains captions in 52 different languages for images from three different captioning datasets: MSCOCO, Flickr30k and Conceptual Captions. In order to train new models, you will need to download the images from the corresponding links.

Our collected captions can be downloaded from this link (dataset.tar.gz). The provided file already contains the folder structure that is required to execute our code, that follows the folder structure of the original datasets. That file also contains the human test translations. The folder translated_independent contains sentences that describe different images in each language. translated_alllangs contains translations that describe the same image for all languages (for testing purposes).

Other dataset information necessary to run our models (splits, tokenizer and word2vec information) can be found in this link (dataset_info.tar.gz)

As a reminder, you can extract the content from a .tar.gz file by using tar -xzvf archive.tar.gz.

The root dataset directory can be given by using the argument --dataset_path. Use --dataset_info_path to indicate the path to the dataset information files. In order to use the code without images, you can use the flag not_use_images.

Pretrained models

The pretrained models reported in our paper can be found in this link (checkpoints.tar.gz):

Each folder (one for each model) contains a .pth file with the checkpoint, as well as a .json file with the configuration.

To resume training or to test from one of these pretrained models, set the --resume flag to True. Extract the models under the /path/to/your/checkpoints directory you introduce in the --checkpoint_dir argument. Refer to the specific model using the --resume_name argument.

In case there is any doubt or problem, feel free to send us an email.