Awesome
LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)
By combining LinCIR with RTD, we can achieve:
Welcome to the official Pytorch implementation of LinCIR!
Discover the magic of LinCIR, a ground-breaking approach to Composed Image Retrieval (CIR) that challenges convention and ushers in a new era of AI research. Dive into the limitless possibilities of zero-shot composed image retrieval with us!
Authors:
Geonmo Gu*<sup>1</sup>, Sanghyuk Chun*<sup>2</sup>, Wonjae Kim<sup>2</sup>, Yoohoon Kang<sup>1</sup>, Sangdoo Yun<sup>2</sup>
<sup>1</sup> NAVER Vision <sup>2</sup> NAVER AI Lab
* First two authors contributed equally.
⭐ Overview
<img src=".github/figure_001.png" height="256">The Composed Image Retrieval (CIR) task, a fusion of image and text, has always been an intriguing challenge for AI researchers. Traditional CIR methods require expensive triplets of query image, query text, and target image for training, limiting scalability.
Enter LinCIR, a revolutionary CIR framework that relies solely on language for training. Our innovative approach leverages self-supervision through self-masking projection (SMP), allowing LinCIR to be trained using text datasets alone.
With LinCIR, we achieve astonishing efficiency and effectiveness. For instance, LinCIR with a CLIP ViT-G backbone is trained in just 48 minutes and outperforms existing methods in zero-shot composed image retrieval on four benchmark datasets: CIRCO, GeneCIS, FashionIQ, and CIRR. In fact, it even surpasses supervised methods on FashionIQ!
🚀 News
- February 27, 2024 - LinCIR is accepted at CVPR 2024!
- December 5, 2023 - LinCIR is officially released!
🛠️ Installation
Get started with LinCIR by installing the necessary dependencies:
$ pip install torch transformers diffusers accelerate datasets spacy
$ python -m spacy download en_core_web_sm
🤗 Demo
If you want to run and execute the demo directly, you can do so by running the script below.
Of course, you can also experience the demo directly on the Huggingface Space.
$ pip intall clip-retrieval
$ python demo.py
Demo will be hosted at https://0.0.0.0:8000
📂 Dataset Preparation
No need to worry about downloading training datasets manually. All training datasets are automatically fetched using the Hugging Face datasets library.
Keep in mind that the training datasets are considerably smaller in volume compared to (image, caption) pairs or triplet datasets like FashionIQ and CIRR.
Please refer to here to prepare the benchmark datasets.
📚 How to Train LinCIR
Train LinCIR with ease using the following command:
$ python -m torch.distributed.run --nproc_per_node 8 --nnodes 1 --node_rank 0 \
--master_addr localhost --master_port 5100 train_phi.py \
--batch_size 64 \
--output_dir /path/to/your_experiment \
--cirr_dataset_path /path/to/cir_datasets/CIRR \
--mixed_precision fp16 \
--clip_model_name large \
--validation_steps 1000 \
--checkpointing_steps 1000 \
--seed 12345 \
--lr_scheduler constant_with_warmup --lr_warmup_steps 0 \
--max_train_steps 20000
If you have a powerful GPU machine with 8 GPUs, simply run the above script. For less powerful GPU machine with single GPU, set --nuproc_per_node
to 1 and adjust --batch_size
to 256 or 512. Rest assured, the results will be consistent.
If you'd like to use ViT-Large, Huge or Giga as CLIP backbone, change --clip_model_name
to large, huge, or giga each.
💯 How to Evaluate LinCIR
CIRR (Test Set)
Evaluate LinCIR on the CIRR test set with the following command:
$ python generate_test_submission.py \
--eval-type phi \
--dataset cirr \
--dataset-path /path/to/CIRR \
--phi-checkpoint-name /path/to/trained_your/phi_best.pt \
--clip_model_name large \
--submission-name lincir_results
Retrieved results will be saved as:
./submission/cirr/{submission-name}.json
./submission/cirr/subset_{submission-name}.json
Upload these files here to view the results.
CIRR (Validation Set, Dev)
For the CIRR validation set, use the following command:
$ python validate.py \
--eval-type phi \
--dataset cirr \
--dataset-path /path/to/CIRR \
--phi-checkpoint-name /path/to/trained_your/phi_best.pt \
--clip_model_name large
FashionIQ
To evaluate LinCIR on FashionIQ, run the following command:
$ python validate.py \
--eval-type phi \
--dataset fashioniq \
--dataset-path /path/to/fashioniq \
--phi-checkpoint-name /path/to/trained_your/phi_best.pt \
--clip_model_name large
CIRCO
Evaluate LinCIR on the CIRCO dataset with the command below:
$ python generate_test_submission.py \
--eval-type phi \
--dataset circo \
--dataset-path /path/to/cir_datasets/CIRCO \
--phi-checkpoint-name /path/to/trained_your/phi_best.pt \
--clip_model_name large \
--submission-name lincir_results
Retrieved results will be saved as:
./submission/circo/{submission-name}.json
./submission/circo/subset_{submission-name}.json
Upload these files here to view the results.
GeneCIS
Evaluating GeneCIS requires a few additional steps. Run the following script:
You can get VG_100K_all
and COCO_val2017
at GeneCIS.
# Assuming you're in the lincir folder.
$ git fetch --all
$ git checkout eval_genecis
$ cd genecis
$ python evaluate.py \
--combiner_mode phi \
--model large \
--combiner_pretrain_path /path/to/lincir_best.pt \
--vg_100k_all_path /path/to/VG_100K_all \
--coco_val2017_path /path/to/val2017
Acknowledgement
We would like to express our special gratitude to the authors of SEARLE for their invaluable contributions, as our code draws significant inspiration from this open-source project.
Citation
@inproceedings{gu2024lincir,
title={Language-only Training of Zero-shot Composed Image Retrieval},
author={Gu, Geonmo and Chun, Sanghyuk and Kim, Wonjae and and Kang, Yoohoon and Yun, Sangdoo},
year={2024},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
}
License
Licensed under CC BY-NC 4.0
LinCIR
Copyright (c) 2023-present NAVER Corp.
CC BY-NC-4.0 (https://creativecommons.org/licenses/by-nc/4.0/)