Home

Awesome

ELEVATER Image Classification Toolkit

Introduction

The toolkit for image classification in the benchmark: Evaluation of Language-augmented Visual Task-level Transfer [ELEVATER].

Contents

Please follow the steps below to use this codebase to reproduce the results in the paper, and onboard your own checkpoints & methods.

  1. Installation
  2. Datasets
  3. Getting Started
  4. Evaluation
    1. Zero-shot
    2. Linear probe / Fine-tuning (Few-shot & Full-shot)
  5. Submit your results to vision leaderboard
  6. Extract GPT3 Knowledge

Installation

Our code base is developed and tested with PyTorch 1.7.0, TorchVision 0.8.0, CUDA 11.0, and Python 3.7.

conda create -n elevater python=3.7 -y
conda activate elevater

conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=11.0 -c pytorch
pip install -r requirements.txt
pip install -e .

Datasets

We support the downstream evaluation of image classification on 20 datasets: Caltech101, CIFAR10, CIFAR100, Country211, DTD, EuroSat, FER2013, FGVCAircraft, Food101, GTSRB, HatefulMemes, KittiDistance, MNIST, Flowers102, OxfordPets, PatchCamelyon, SST2, RESISC45, StanfordCars, VOC2007. Our toolkit also supports ImageNet-1K evaluation, whose result is shown as reference on the leaderboard.

To evaluate on these datasets, our toolkit automatically downloads these datasets once with vision-datasets and store them locally for the future usage. You do NOT need to explicitly download any datasets. However, if you are interested in downloading all data before running experiments, please refer to [Data Download].

Getting Started

ELEVATER benchmark supports three types of the evaluation: zeroshot, linear probe, and finetuning. We have embodied all three types of the evaluation into a unified launch script: run.sh. By specifying different arguments, you may enable different settings, including:

Few-shot

Language-augmented model adaptation method

Unitilization of external knowledge sources

To run the benchmark toolkit, please refer to the instructions in run.sh and modify accordingly. By default, ./run.sh will run the zeroshot evaluation of the CLIP ViT/B-32 checkpoint on Caltech-101 dataset.

Launch Multiple Experiments

You may need to launch multiple experiments in batch as ELEVATER benchmark contains 20 datasets. We provide an example script run_multi.sh where you can specify different configurations from command line directly without modifying the shell script.

DATASET=caltech101 \
OUTPUT_DIR=./output/experiment \
bash run_multi.sh

You can refer to run_multi.sh to add other customizable configurations. Examples are dataset and output_dir.

Evaluation

Zero-shot Evaluation

Our implementation and prompts are from OpenAI repo: [Notebook] [Prompt].

For zero-shot evaluation, we support both the model from the CLIP repo and customized models.

To evaluate customized model for zeroshot evaluation, you need to:

Linear Probe and Fine-tuning

We use automatic hyperparameter tuning for linear probe and finetuning evaluation. For details, please refer to Appendix Sec. D of our paper.

Models evaluated here can be models from:

To evaluate customized model, you need to:

Submit to Leaderboard

Leaderboard submission are supported via EvalAI. Please first generate the prediction files locally, and then submit the results to Eval AI. Details are documented as below.

Generate Prediction Files

You need to evaluate and generate prediction files for all 20 datasets before submitting to the leaderboard. However, to test that the pipeline is working correctly, you can submit partial evaluation results. The partially evaluated results can be found from the link under "Result file" column. You may also optionally make them appear on the leaderboard, but the "Average Score" will not be computed as the results are not complete.

To generate the prediction files, follow the steps below:

  1. Verify that prediction file submission is supported. Prediction file generation is only supported after commit 2c7a53c3. Please make sure that your local copy of our code base is up-to-date.

  2. Generate prediction files for all datasets separately. Please make sure to modify output folder accordingly so that 20 prediction files for the same configuration will appear within the same folder.

# Modify these two lines accordingly in run.sh

DATASET=caltech101 \
OUTPUT_DIR=./output/exp_1_submit \
  bash run_multi.sh
  1. Combine all prediction files to a single zip file. Assume /path_to_predictions contains all 20 JSON prediction files (60 files [20 datasets * 3 seeds] for few-shot experiments). The combined prediction file will be located at /path_to_predictions/all_predictions.zip
python commands/prepare_submit.py \
  --combine_path /path_to_predictions

Examples of Prediction Files

Please check out the format illustration and examples for prediction files in submission_file_readme.md

Submit to EvalAI

View Leaderboard

Navigate to Leaderboard tab to view all baseline results and results from the community.

Extract GPT3 Knowledge

Modify these three lines accordingly in run_gpt3.sh, and run sh run_gpt3.sh

OUTPUT_DIR=./output/exp_1_extract_knowledge  # the path that the generated gpt3 knowledge is saved
apikey=XXXX # Please use your GPT3 API key 
ds='cifar10' 

Citation

Please cite our paper as below if you use the ELEVATER benchmark or our toolkit.

@article{li2022elevater,
    title={ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models},
    author={Li, Chunyuan and Liu, Haotian and Li, Liunian Harold and Zhang, Pengchuan and Aneja, Jyoti and Yang, Jianwei and Jin, Ping and Lee, Yong Jae and Hu, Houdong and Liu, Zicheng and Gao, Jianfeng},
    journal={Neural Information Processing Systems},
    year={2022}
}