Home

Awesome

PyTorch implementation for Cross-modal Active Complementary Learning with Self-refining Correspondence (NeurIPS 2023). The solution to the noisy correspondence problem in image-text matching.

Supplementary material

Introduction

Datasets

Our directory structure of data.

data
├── f30k_precomp # pre-computed BUTD region features for Flickr30K, provided by SCAN
│     ├── train_ids.txt
│     ├── train_caps.txt
│     ├── ......
│
├── coco_precomp # pre-computed BUTD region features for COCO, provided by SCAN
│     ├── train_ids.txt
│     ├── train_caps.txt
│     ├── ......
│
├── cc152k_precomp # pre-computed BUTD region features for cc152k, provided by NCR
│     ├── train_ids.txt
│     ├── train_caps.tsv
│     ├── ......
│
└── vocab  # vocab files provided by SCAN and NCR
      ├── f30k_precomp_vocab.json
      ├── coco_precomp_vocab.json
      └── cc152k_precomp_vocab.json

MS-COCO and Flickr30K

We follow SCAN to obtain image features and vocabularies.

CC152K

Following NCR, we use a subset of Conceptual Captions (CC), named CC152K. CC152K contains training 150,000 samples from the CC training split, 1,000 validation samples and 1,000 testing samples from the CC validation split.

Download Dataset

Training and Evaluation

Training new models

sh train.sh


#!/bin/bash

# More recommended hyperparameter settings can be found the in the Table 1 at https://openreview.net/attachment?id=UBBeUjTja8&name=supplementary_material

filename=f30k
module_name=SGR
# VSEinfty SAF SGR
gpus=3
# schedules=30
# schedules='2,2,2,20'
# lr_update=10
# schedules='5,5,5,40'
schedules='5,5,5,30'
lr_update=15
noise_rate=0.8
warm_epoch=2
tau=0.05
alpha=0.8
 
folder_name=./NCR_logs/${filename}_${module_name}_${noise_rate} 

noise_file=./noise_index/f30k_precomp_0.8.npy

data_path='/home_bak/hupeng/data/data'
vocab_path='/home_bak/hupeng/data/vocab'


CUDA_VISIBLE_DEVICES=$gpus python train.py --val_step 1000 --gpu $gpus --alpha $alpha   --data_name ${filename}_precomp \
    --tau $tau --data_path $data_path --vocab_path $vocab_path   --warm_epoch $warm_epoch\
    --schedules $schedules --lr_update $lr_update --noise_file $noise_file --module_name $module_name --folder_name $folder_name --noise_ratio $noise_rate  

Evaluation

python eval.py

Citation

If CRCL is useful for your research, please cite the following paper:


@article{qin2024cross,
  title={Cross-modal Active Complementary Learning with Self-refining Correspondence},
  author={Qin, Yang and Sun, Yuan and Peng, Dezhong and Zhou, Joey Tianyi and Peng, Xi and Hu, Peng},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

License

Apache License 2.0