

Evaluation Code for Cobra

This is my fork of vlm-evaluation to provide the evaluation code of Cobra. The content below mainly comes from the main branch.

VLM Evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning.

Built with PyTorch, using sane quality defaults (black, ruff, pre-commit).


This repository is built on top of PyTorch; while specified as a dependency for the package, we highly recommend that you install the desired version of PyTorch (e.g., with accelerator support) for your given hardware and dependency manager (e.g., conda). Otherwise, the default installed version be incompatible.

PyTorch installation instructions can be found here. This repository requires PyTorch >= 2.1.0, but has only been thoroughly tested with PyTorch 2.1.0, Torchvision 0.16.0, Torchaudio 2.1.0.

Once PyTorch has been properly installed, you can install this package locally via an editable installation:

git clone https://github.com/h-zhao1997/vlm-evaluation
cd vlm-evaluation
pip install -e .

Finally, make sure to copy your HuggingFace token to .hf_token.


Prepare datasets for eval: scripts/datasets/prepare.py; model and evaluation dataset configs are defined in vlm_eval/conf

Entry Point: scripts/evaluate.py; model and evaluation dataset configs are defined in vlm_eval/conf. This script evaluates a given model on the specified dataset

Interactive GUI: scripts/interactive_demo.py loads a trained model and creates a gradio style interactive demo.

Scoring: scripts/score.py scores an evaluated model.


First make sure you create the folders for evaluation datasets and results. For example: /home/ubuntu/datasets/vlm-evaluation, /home/ubuntu/prismatic-vlms/results

(1) Prepare datasets for Text VQA:

python scripts/datasets/prepare.py --dataset_family text-vqa

where dataset_family can be selected from [vqa-v2, gqa, vizwiz, text-vqa, refcoco, ocid-ref, tally-qa, pope, vsr]

(2) Evaluate LLaVa 1.5 (7B) and Prism 7B models on Text VQA slim dataset:

python scripts/evaluate.py --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b --dataset.type text-vqa-slim --dataset.root_dir /home/ubuntu/datasets/vlm-evaluation

For prismatic models you can either pass just a model_id:

python scripts/evaluate.py --model_id prism-dinosiglip+7b --dataset.type text-vqa-slim --dataset.root_dir /home/ubuntu/datasets/vlm-evaluation

or you can provide a path to a model directory with --model_dir. If a model_dir is provided, model_id will be ignored.

If you have multiple GPUs available:

accelerate launch --num_processes=<NUM_GPUS> scripts/evaluate.py --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b --dataset.type text-vqa-slim --dataset.root_dir /home/ubuntu/datasets/vlm-evaluation

accelerate launch --num_processes=<NUM_GPUS> scripts/evaluate.py --model_id prism-dinosiglip+7b --dataset.type text-vqa-slim --dataset.root_dir /home/ubuntu/datasets/vlm-evaluation

You can evaluate any models trained in the accompanying prismatic-vlms codebase by modifying the model_dir, model_family, and model_id above accordingly.

(3) Score LLaVa 1.5 (7B) and Prism 7B models on Text VQA

python scripts/score.py --model_id llava-v1.5-7b --dataset.type text-vqa-slim --dataset.root_dir /home/ubuntu/datasets/vlm-evaluation --results_dir /home/ubuntu/prismatic-vlms/results

python scripts/score.py --model_id prism-dinosiglip+7b --dataset.type text-vqa-slim --dataset.root_dir /home/ubuntu/datasets/vlm-evaluation --results_dir /home/ubuntu/prismatic-vlms/results

(4) To chat with the LLaVa 1.5 (7B) and Prism 7B models in an interactive GUI, run the following scripts in separate terminals.

Launch gradio controller:

python -m vlm_eval.serve.controller --host --port 10000

Launch web server:

python -m vlm_eval.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share

Now we can launch an interactive demo corresponding to each of the models we want to chat with. For Prism models, you onl need to specify a model_id, while for LLaVA and InstructBLIP, you need to additionally specifiy a model_family and model_dir. Note that for each model, a different port must be specified.

Launch interactive demo for Prism 7B Model:

python -m scripts.interactive_demo --port 40000 --model_id prism-dinosiglip+7b

Launch interactive demo for LLaVA 1.5 7B Model:

python -m scripts.interactive_demo --port 40001 --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b

When running the demo, the following parameters are adjustable:

The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other interaction modes for more specific use cases:


Before committing to the repository, make sure to set up your dev environment!

Here are the basic development environment setup guidelines:

Repository Structure

High-level overview of repository/project file-tree: