Awesome
Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges
Chenhang Cui*, Yiyang Zhou*, Xinyu Yang, Shirley Wu, Linjun Zhang, James Zou, Huaxiu Yao
<div align="center"> *Equal Contribution </div> <div align="center"> <a href="https://arxiv.org/pdf/2311.03287.pdf"><img src="assets/Paper-Arxiv-orange.svg" ></a> </div>🚨 Revealing the minds of GPT-4V(ision)! We're exploring the Limitations of Visual Language Models!
🚨 Our dataset is now available on Hugging Face, thanks to Haoqin's contribution.
Getting Started
Git clone our repository, creating a python environment and ativate it via the following command
git clone https://github.com/gzcch/Bingo.git
cd Bingo
Download the dataset from here
cd Bingo_benchmark
pip install -r requirements.txt
Convert your model response file into the format of a file like here according to the annotation file here.
python bingo_gpt_eval.py --openai_key YOUR_OPENAI_KEY --answer_file_path Your_Answer_File --model_name Model_Name
Introducing our innovative 'Bingo' benchmark, we delve into the prevalent manifestations of inaccuracies in GPT-4V(ision): bias and interference.
Region Bias
🌍GPT-4V(ision) favors Western images over those from other regions (e.g., East Asian, Africa) and exhibits region bias.
<div style="display: flex; justify-content: center;"> <img src="assets/regio.png" alt="None" style="width: 45%; margin-right: 5px;"> <img src="assets/region_2.jfif" alt="None" style="width: 45%; margin-left: 5px;"> </div>OCR Bias
GPT-4V(ision) outperforms in English & French text recognition within images compared with other three languages.
<div style="display: flex; justify-content: center;"> <img src="assets/regio.png" alt="None" style="width: 45%; margin-right: 5px;"> <img src="assets/language_2.jfif" alt="Noe" style="width: 45%; margin-left: 5px;"> </div>Factual bias
GPT-4V(ision) gets tripped by images with counterfactuals, sticking to what's 'common sense' instead of what's in the image. Like missing Saturn in a solar system photo, it still calls out Saturn.
<div style="display: flex; justify-content: center;"> <img src="assets/factual.png" alt="None" style="width: 45%; margin-right: 5px;"> <img src="assets/factual_main.png" alt="None" style="width: 45%; margin-left: 5px;"> </div>Image-to-Image Interference
Composite images lead to confusion! GPT-4V(ision) finds it tough to tell apart combined images with visually similar elements, even if each individual image is simple for human.
<div style="display: flex; justify-content: center;"> <img src="assets/i2i.png" alt="None" style="width: 45%; margin-right: 5px;"> <img src="assets/i2i_2.jfif" alt="None" style="width: 45%; margin-left: 5px;"> </div>Text-to-Image Interference
GPT-4V(ision) might just follow a misleading text prompt, ignoring the actual image.
<div style="display: flex; justify-content: center;"> <img src="assets/t2i.png" alt="None"> </div>Citation
If you found this work useful, consider citing our paper as followed:
@misc{cui2023holistic,
title={Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges},
author={Chenhang Cui and Yiyang Zhou and Xinyu Yang and Shirley Wu and Linjun Zhang and James Zou and Huaxiu Yao},
year={2023},
eprint={2311.03287},
archivePrefix={arXiv},
primaryClass={cs.LG}
}