Awesome
PyTorch implementation for Learning with Noisy Correspondence for Cross-modal Matching (NeurIPS 2021 Oral).
Update
- 2022-10-17, We provide the image urls of CC152K from Conceptual Captions (CC), which might be helpful to your research.
|-- cc152k
| |-- dev_caps_img_urls.csv
| |-- test_caps_img_urls.csv
| `-- train_caps_img_urls.csv
Use img2dataset to download images from the csv files. More details
Introduction
NCR framework
<img src="https://github.com/XLearning-SCU/2021-NeurIPS-NCR/blob/main/framework.png" width="860" height="268" />Requirements
- Python 3.7
- PyTorch ~1.7.1
- numpy
- scikit-learn
- Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt
Datasets
MS-COCO and Flickr30K
We follow SCAN to obtain image features and vocabularies.
CC152K
We use a subset of Conceptual Captions (CC), named CC152K. CC152K contains training 150,000 samples from the CC training split, 1,000 validation samples and 1,000 testing samples from the CC validation split. We follow the pre-processing step in SCAN to obtain the image features and vocabularies.
Training and Evaluation
Training new models from scratch
Modify the data_path
and vocab_path
, then train and evaluate the model(s):
# CC152K
python ./NCR/run.py --gpu 0 --workers 2 --warmup_epoch 10 --data_name cc152k_precomp --data_path data_path --vocab_path vocab_path
# MS-COCO: noise_ratio = {0, 0.2, 0.5}
python ./NCR/run.py --gpu 0 --workers 2 --warmup_epoch 10 --data_name coco_precomp --num_epochs 20 --lr_update 10 --noise_ratio 0.2 --data_path data_path --vocab_path vocab_path
# Flickr30K: noise_ratio = {0, 0.2, 0.5}
python ./NCR/run.py --gpu 0 --workers 2 --warmup_epoch 5 --data_name f30k_precomp --noise_ratio 0.2 --data_path data_path --vocab_path vocab_path
It should train the model from scratch and evaluate the best model.
Pre-trained models and evaluation
The pre-trained models are available here:
- CC152K model Download
- MS-COCO 0% noise model Download
- MS-COCO 20% noise model Download
- MS-COCO 50% noise model Download
- F30K 0% noise model Download
- F30K 20% noise model Download
- F30K 50% noise model Download
Modify the model_path
, data_path
, vocab_path
in the evaluation.py
file. Then run evaluation.py
:
python ./NCR/evaluation.py
Note that for MS-COCO, please set
split
totestall
, andfold5
tofalse
(5K evaluation) ortrue
(Five-fold 1K evaluation).
Experiment Results:
<img src="https://github.com/XLearning-SCU/2021-NeurIPS-NCR/blob/main/mscoco_flickr30k.png" width="740" height="434" /> <img src="https://github.com/XLearning-SCU/2021-NeurIPS-NCR/blob/main/cc152k.png" width="565" height="238" />Citation
If NCR is useful to your research, please cite the following paper:
@article{huang2021learning,
title={Learning with Noisy Correspondence for Cross-modal Matching},
author={Huang, Zhenyu and Niu, Guocheng and Liu, Xiao and Ding, Wenbiao and Xiao, Xinyan and Wu, Hua and Peng, Xi},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}
License
Acknowledgements
The code is based on SGRAF and SCAN licensed under Apache 2.0.