Home

Awesome

MCSE: Multimodal Contrastive Learning of Sentence Embeddings

This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multimodal Contrastive Learning of Sentence Embeddings. If you find this repository useful, please consider citing our paper.

Contact: Miaoran Zhang (mzhang@lsv.uni-saarland.de)

Pre-trained Models & Results

ModelAvg. STS
mcse-flickr-bert-base-uncased [Google Drive] [Huggingface]77.70
mcse-flickr-roberta-base [Google Drive] [Huggingface]78.44
mcse-coco-bert-base-uncased [Google Drive] [Huggingface]77.08
mcse-coco-roberta-base [Google Drive] [Huggingface]78.17

Note: flickr indicates that models are trained on wiki+flickr, and coco indicates that models are trained on wiki+coco.

Quickstart

Setup

pip install -r requirements.txt

Data Preparation

Please organize the data directory as following:

REPO ROOT
|
|--data    
|  |--wiki1m_for_simcse.txt  
|  |--flickr_random_captions.txt    
|  |--flickr_resnet.hdf5    
|  |--coco_random_captions.txt    
|  |--coco_resnet.hdf5  

Wiki1M

wget https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/resolve/main/wiki1m_for_simcse.txt

Flickr30k & MS-COCO
You can either download the preprocessed data we used:
(annotation sources: flickr30k-entities and coco).

Or preprocess the data by yourself (take Flickr30k as an example):

  1. Download the flickr30k-entities.
  2. Request access to the flickr-images from here. Note that the use of the images much abide by the Flickr Terms of Use.
  3. Run script:
    unzip ${path_to_flickr-entities}/annotations.zip
    
    python preprocess/prepare_flickr.py \
        --flickr_entities_dir ${path_to_flickr-entities}  \  
        --flickr_images_dir ${path_to_flickr-images} \
        --output_dir data/
        --batch_size 32
    

Train & Evaluation

  1. Prepare the senteval datasets for evaluation:

    cd SentEval/data/downstream/
    bash download_dataset.sh
    
  2. Run scripts:

    # For example:  (more examples are given in scripts/.)
    sh scripts/run_wiki_flickr.sh
    

    Note: In the paper we run experiments with 5 seeds (0,1,2,3,4). You can find the detailed parameter settings in Appendix.

Acknowledgements