Home

Awesome

<p align="center"> <img src="imgs/tittle_fig.png" width="150" style="margin-bottom: 0.2;"/> <p> <h2 align="center"> <a href="https://arxiv.org/abs/2312.02896"> [ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models</a></h2> <h4 align="center"> <a href="https://github.com/RizhaoCai">Rizhao Cai*</a><sup>1</sup>, <a href="https://github.com/ZiruiSongBest">Zirui Song*</a><sup>2,3</sup>, <a href="https://dayan-guan.github.io/">Dayan Guan†</a><sup>1</sup>, <a href="https://zhenhaochenofficial.github.io/">Zhenhao Chen</a><sup>4</sup>, <a href="https://github.com/dylanli-hang">Yaohang Li</a><sup>2,3</sup>, <a>Xing Luo</a><sup>5</sup>, <a href="https://github.com/Newbeeyoung">Chenyu Yi</a><sup>1</sup>, <a href="https://personal.ntu.edu.sg/eackot/">Alex Kot</a><sup>1</sup> </h4> <ul align="center"> <sup>1</sup>Nanyang Technological University</li> <sup>2</sup>University of Technology Sydney</li> <sup>3</sup>Northeastern University</li> <sup>4</sup>Mohamed bin Zayed University of Artificial Intelligence</li> <sup>5</sup>Zhejiang University</li> </ul> <h5 align="center"> *Equal contribution, †Corresponding Author </h5> <h5 align="center"> If you like our project, please give us a star ⭐ on GitHub for latest update. </h2> <h5 align="center">

Website hf_space arXiv Endpoint Badge

zhihu Wechat Twitter Twitter Twitter <br>

</h5>

Benchmark Examples

Demo

Note: For a simple presentation, the questions in Domestic Robot and Open Game have been simplified from multiple-choice format. Please see our Benchmark for more examples and detailed questions.

Directory Structure

Evaluate on our Benchmark

git clone git@github.com:AIFEG/BenchLMM.git
cd BenchLMM
mkdir evaluate_results
{
  "question_id": 110, 
  "prompt": "Is there any defect in the object in this image? Answer the question using a single word or phrase.", 
  "model_output": "Yes",
}
.
├── answers_Benchmark_AD.jsonl
├── xxxxxxxx_CT.jsonl
├── xxxxxxxx_MRI.jsonl
├── xxxxxxxx_Med-X-RAY.jsonl
├── xxxxxxxx_RS.jsonl
├── xxxxxxxx_Robots.jsonl
├── xxxxxxxx_defect_detection.jsonl
├── xxxxxxxx_game.jsonl
├── xxxxxxxx_infrard.jsonl
├── xxxxxxxx_style_cartoon.jsonl
├── xxxxxxxx_style_handmake.jsonl
├── xxxxxxxx_style_painting.jsonl
├── xxxxxxxx_style_sketch.jsonl
├── xxxxxxxx_style_tattoo.jsonl
├── xxxxxxxx_xray.jsonl
bash scripts/evaluate.sh

Note: Score will be saved in the file results. Robots and game scores are included in the evaluate_results/Robots.jsonl and evaluate_results/game.jsonl respectively.

Baseline

ModelVRAM required
InstructBLIP-7B30GB
InstructBLIP-13B65GB
LLava-1.5-7B<24GB
LLava-1.5-13B30GB

LLaVA

  1. Clone this repository and navigate to LLaVA folder
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
  1. Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
  1. LLaVA Weights
    Please check out our Model Zoo for all public LLaVA checkpoints, and the instructions of how to use the weights.
  1. Add the file BenchLMM_LLaVA_model_vqa.py to the path LLaVA/llava/eval/

  2. Modify the file path and run the script scripts/LLaVA.sh

bash scripts/LLaVA.sh
  1. Evaluate results
bash scripts/evaluate.sh

Note: Score will be saved in the file results.

InstructBLIP

git clone https://github.com/salesforce/LAVIS.git  
cd LAVIS  
pip install -e .  

Modify the file path and run the script BenchLMM/scripts/InstructBLIP.sh

bash BenchLMM/scripts/InstructBLIP.sh
bash BenchLMM/scripts/evaluate.sh

Note: Score will be saved in the file results.


Cite our work

@article{cai2023benchlmm,
  title={BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models},
  author={Cai, Rizhao and Song, Zirui and Guan, Dayan and Chen, Zhenhao and Luo, Xing and Yi, Chenyu and Kot, Alex},
  journal={arXiv preprint arXiv:2312.02896},
  year={2023}
}

Contact

If you have any question or issue with our project, please contact Dayan Guan: dayan.guan@outlook.com

Acknowledgement

This research is supported in part by the Rapid-Rich Object Search (ROSE) Lab of Nanyang Technological University and the NTU-PKU Joint Research Institute (a collaboration between NTU and Peking University that is sponsored by a donation from the Ng Teng Fong Charitable Foundation). We are deeply grateful to Yaohang Li from the University of Technology Sydney for his invaluable assistance in conducting the experiments, and to Jingpu Yang, Helin Wang, Zihui Cui, Yushan Jiang, Fengxian Ji, and Yuxiao Hang from NLULab@NEUQ (Northeastern University at Qinhuangdao, China) for their meticulous efforts in annotating the dataset. We also would like to thank Prof. Miao Fang (PI of NLULab@NEUQ) for his supervision and insightful suggestion during discussion on this project.

Related project