Awesome

Improving Cross-Modal Retrieval with Set of Diverse Embeddings

This repository contains the official source code for our paper:

Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim, Namyup Kim, and Suha Kwak <br> POSTECH CSE<br> CVPR (Highlight), Vancouver, 2023.

Acknowledgement

Parts of our codes are adopted from the following repositories.

Dataset

data 
├─ coco_download.sh  
├─ coco # can be downloaded with the coco_download.sh 
│  ├─ images
│  │  └─ ......
│  └─ annotations 
│     └─ ......
├─ coco_butd
│  └─ precomp  
│     ├─ train_ids.txt
│     ├─ train_caps.txt
│     └─ ......   
├─ f30k 
│  ├─ images
│  │  └─ ......
│  ├─ dataset_flickr30k.json
│  └─ ......  
└─ f30k_butd
   └─ precomp  
      ├─ train_ids.txt
      ├─ train_caps.txt
      └─ ......

vocab # included in this repo
├─ coco_butd_vocab.pkl
└─ ......

coco_butd and f30k_butd: Datasets used for the Faster-RCNN image backbone. We use the pre-computed features provided by SCAN, which can be downloaded via https://github.com/kuanghuei/SCAN#download-data.
coco and f30k: Datasets used for the CNN backbones. Please refer the COCO download script and Flickr30K website+Flickr30K .json to download the images and captions.

Note: Downloaded datasets should be placed according to the directory structure presented above.

Requirements

You can install requirements using conda.

conda create --name <env> --file requirements.txt

Training on COCO

sh train_eval_coco.sh