Home

Awesome

Improving Cross-Modal Retrieval with Set of Diverse Embeddings

arXiv

This repository contains the official source code for our paper:

Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim, Namyup Kim, and Suha Kwak <br> POSTECH CSE<br> CVPR (Highlight), Vancouver, 2023.

Acknowledgement

Parts of our codes are adopted from the following repositories.

Dataset

data 
├─ coco_download.sh  
├─ coco # can be downloaded with the coco_download.sh 
│  ├─ images
│  │  └─ ......
│  └─ annotations 
│     └─ ......
├─ coco_butd
│  └─ precomp  
│     ├─ train_ids.txt
│     ├─ train_caps.txt
│     └─ ......   
├─ f30k 
│  ├─ images
│  │  └─ ......
│  ├─ dataset_flickr30k.json
│  └─ ......  
└─ f30k_butd
   └─ precomp  
      ├─ train_ids.txt
      ├─ train_caps.txt
      └─ ......

vocab # included in this repo
├─ coco_butd_vocab.pkl
└─ ......

Note: Downloaded datasets should be placed according to the directory structure presented above.

Requirements

You can install requirements using conda.

conda create --name <env> --file requirements.txt

Training on COCO

sh train_eval_coco.sh