Home

Awesome

CREPE

In this repository, you can find the code we used to evaluate these models: open_clip CLIP models, the official OpenAI CLIP models, CyCLIP, FLAVA and ALBEF on compositional reasoning in our paper CREPE: Can Vision-Language Foundation Models Reason Compositionally?.

<img src="https://user-images.githubusercontent.com/36333627/228138304-86721dc2-3b8b-4c6e-acde-6bcfcd416cfe.png" width="800">

Systematicity procedure

<img src="https://user-images.githubusercontent.com/36333627/228139225-71dc2cda-a5e7-4710-a9e8-7fbb8281eb01.png" width="800">

Produtivity procedure

<img src="https://user-images.githubusercontent.com/36333627/228139880-748609e2-9476-4c33-bca3-200deea369fe.png" width="800">

Evaluation instructions

In crepe_eval_utils.py, you can find common evaluation util functions, and you will need to replace vg_image_paths with the path to Visual Genome images on your machine. The VG images can be downloaded here.

We evaluated all models on an NVIDIA TITAN X GPU with a CUDA version of 11.4.

Evaluate open_clip CLIP models on systematicity and productivity

You will need to install the packages required to use open_clip here. You can download the pretrained CLIP models and replace --model-dir with your own model checkpoint directory path in crepe_compo_eval_open_clip.py. (You can also modify the code to use open_clip's pretrained model interface.)

To evaluate all models reported in our paper, simply run:

python -m crepe_compo_eval_open_clip --compo-type <compositionality_type> --hard-neg-types <negative_type_1> <negative_type_2> --input-dir <path_to_crepe/crepe/syst_hard_negatives>  --output-dir <log_directory>

where the valid compositionality types are systematicity and productivity. The valid negative types are atom, comp and combined (atom+comp) for systematicity, and atom, swap and negate for productivity.

To evaluate other pretrained models, simply modify the --train-dataset argument and/or the DATA2MODEL variable in crepe_compo_eval_open_clip.py. Note that the systematicity eval set should only be used to evaluate models pretrained on CC12M, YFCC15M or LAION400M.

Evaluate all other vision-language models on productivity

For each model, you will need to clone the model's official repository, set up an environment according to its instructions and place the files crepe_prod_eval_<model>.py and crepe_eval_utils.py to their relevant locations. In crepe_params.py, you will need to replace --input-dir with your own directory path to CREPE's productivity hard negatives test set.

CLIP-specific instructions

Clone the CLIP repository here and place crepe_prod_eval_clip.py and crepe_eval_utils.py on the top level of the repository. To evaluate models, simply run:

python -m crepe_prod_eval_clip --model-name <model_name> --hard-neg-types <negative_type> --output-dir <log_directory> 

where the valid negative types are atom, swap and negate, and model names are RN50, RN101, ViT-B/32, ViT-B/16 and ViT-L/14.

CyCLIP-specific instructions

Clone the CyCLIP repository here, place crepe_prod_eval_cyclip.py and crepe_eval_utils.py on the top level of the repository and download the model checkpoint under the folder cyclip.pt (accessible from the bottom of the repository's README). To evaluate models, simply run:

python -m crepe_prod_eval_cyclip --hard-neg-types <negative_type> --output-dir <log_directory>

FLAVA-specific instructions

Clone the FLAVA repository here and copy crepe_prod_eval_flava.py and crepe_eval_utils.py into the folder examples/flava/. To evaluate models, simply run:

python -m crepe_prod_eval_flava --hard-neg-types <negative_type> --output-dir <log_directory>

ALBEF-specific instructions

Clone the ALBEF repository here, copy crepe_prod_eval_albef.py and crepe_eval_utils.py to the top level of the repository and download the pretrained checkpoint marked '14M' from the repository. To evaluate models, simply run:

python -m crepe_prod_eval_albef --hard-neg-types <negative_type> --output-dir <log_directory>

Citation

If you find our work helpful, please cite us:

 @article{ma2023crepe,
  title={CREPE: Can Vision-Language Foundation Models Reason Compositionally?}, 
  author={Zixian Ma and Jerry Hong and Mustafa Omer Gul and Mona Gandhi and Irena Gao and Ranjay Krishna},
  year={2023},
  journal={arXiv preprint arXiv:2212.07796},
}