Home

Awesome

Uncertainty-Aware Evaluation for Vision-Language Models

<p align="center"> <img src="images/logo.png" width="50%" /> </p>

Introduction

Datasets

Evaluation

Accuracy Results

model_nameMBOODSQASBAI2DAvg.
LLaVA-v1.6-Vicuna-13B76.7572.9370.5670.3773.6772.85
Monkey-Chat76.9870.674.6666.167.9571.26
LLaVA-v1.6-Vicuna-7B75.5673.765.8669.0669.7570.78
InternLM-XComposer2-VL71.7770.0477.9564.4466.1370.07
Yi-VL-6B75.2473.9166.7266.2558.8468.19
CogAgent-VQA74.7868.5767.1268.0158.267.34
MobileVLM_V2-7B75.9766.5372.3366.7153.5567.02
MoE-LLaVA-Phi2-2.7B73.7374.8264.0466.4255.7666.95
mPLUG-Owl273.0573.2865.7161.4954.3865.58
Qwen-VL-Chat71.454.2263.2359.7965.0962.74

Set Sizes Results

model_nameMMBOODSQASBAI2DAvg.
LLaVA-v1.6-Vicuna-13B2.342.182.452.492.332.36
Monkey-Chat2.72.922.563.263.192.93
LLaVA-v1.6-Vicuna-7B2.372.342.452.532.372.41
InternLM-XComposer2-VL2.722.22.413.083.022.69
Yi-VL-6B2.472.022.762.6132.57
CogAgent-VQA2.332.462.362.492.942.52
MobileVLM_V2-7B2.532.612.622.83.42.79
MoE-LLaVA-Phi2-2.7B2.541.892.72.692.922.55
mPLUG-Owl22.552.092.712.9332.65
Qwen-VL-Chat2.73.322.93.323.13.07

Uncertainty-aware Accuracy Results

model_nameMMBOODSQASBAI2DAvg.
LLaVA-v1.6-Vicuna-13B90.4186.2978.0473.8784.5882.64
Monkey-Chat83.4163.2281.752.4956.0867.38
LLaVA-v1.6-Vicuna-7B87.8782.8169.7770.6977.277.67
InternLM-XComposer2-VL69.9880.4994.3752.656.470.77
Yi-VL-6B84.5695.0564.0164.5649.3171.5
CogAgent-VQA85.5671.1472.469.964969.61
MobileVLM_V2-7B84.1964.3579.0762.1839.3965.84
MoE-LLaVA-Phi2-2.7B82.83100.7361.1864.6747.8771.46
mPLUG-Owl278.489.2462.9252.9145.0965.71
Qwen-VL-Chat69.5840.2854.7144.754.352.71

Getting started

6 groups of models could be launch from one environment: LLaVa, CogVLM, Yi-VL, Qwen-VL, internlm-xcomposer, MoE-LLaVA. This environment could be created by the following code:

python3 -m venv venv
source venv/bin/activate
pip install git+https://github.com/haotian-liu/LLaVA.git 
pip install git+https://github.com/PKU-YuanGroup/MoE-LLaVA.git --no-deps
pip install deepspeed==0.9.5
pip install -r requirements.txt
pip install xformers==0.0.23 --no-deps

mPLUG-Owl model can be launched from the following environment:

python3 -m venv venv_mplug
source venv_mplug/bin/activate
git clone https://github.com/X-PLUG/mPLUG-Owl.git
cd mPLUG-Owl/mPLUG-Owl2
git checkout 74f6be9f0b8d42f4c0ff9142a405481e0f859e5c
pip install -e .
pip install git+https://github.com/haotian-liu/LLaVA.git --no-deps
cd ../../
pip install -r requirements.txt

Monkey models can be launched from the following environment:

python3 -m venv venv_monkey
source venv_monkey/bin/activate
git clone https://github.com/Yuliang-Liu/Monkey.git
cd ./Monkey
pip install -r requirements.txt
pip install git+https://github.com/haotian-liu/LLaVA.git --no-deps
cd ../
pip install -r requirements.txt

To check all models you can run scripts/test_model_logits.sh

To work with Yi-VL:

apt-get install git-lfs
cd ../
git clone https://huggingface.co/01-ai/Yi-VL-6B

Model logits

To get model logits in four benchmarks run command from scripts/run.sh.

To quantify uncertainty by logits

python -m uncertainty_quantification_via_cp --result_data_path 'output' --file_to_write 'full_result.json'

To get result tables by uncertainty

python -m make_tables --result_path 'full_result.json' --dir_to_write 'tables'

Citation

@article{kostumov2024uncertainty,
  title={Uncertainty-Aware Evaluation for Vision-Language Models},
  author={Kostumov, Vasily and Nutfullin, Bulat and Pilipenko, Oleg and Ilyushin, Eugene},
  journal={arXiv preprint arXiv:2402.14418},
  year={2024}
}

Acknowledgement

LLM-Uncertainty-Bench: conformal prediction applied to LLM. Thanks for the authors for providing the framework.

Contact

We welcome suggestions to help us improve benchmark. For any query, please contact us at v.kostumov@ensec.ai. If you find something interesting, please also feel free to share with us through email or open an issue. Thanks!