Home

Awesome

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

Oversensitive Safety Alignment Multi-Modal MOSSBench
GPT-4 Gemini-Pro Claude-3

Code for the Paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?".

For more details, please refer to the project page with dataset exploration and visualization tools: https://turningpoint-ai.github.io/MOSSBench/.

:bell: If you have any questions or suggestions, please don't hesitate to let us know. You can comment on the Twitter, or post an issue on this repository.

[Webpage] [Paper] [Huggingface Dataset] [Visualization] [Result Explorer] [Twitter]

<p align="center"> <img src="website\static\images\psych.webp" width="40%"> <br> Logo for <b>MOSSBench</b> generated by DALLยทE 3. </p>

Outlines

๐Ÿ’ฅ News ๐Ÿ’ฅ

<!-- - **[2024.06.22]** ๐Ÿ’ฅ **Claude 3.5 Sonnet achieves new SOTA** on MathVista with **67.7**! Learn more at the [Anthropic blog](https://www.anthropic.com/news/claude-3-5-sonnet). --> <!-- - **[2024.05.13]** ๐Ÿ’ฅ **OpenAI's GPT-4o Outperforms Humans on MathVista!** For the first time, OpenAI's new GPT-4o model has achieved a higher score than the human average on MathVista, scoring **63.8** compared to humans' **60.3**. Learn more at the [OpenAI blog](https://openai.com/index/hello-gpt-4o/). - **[2024.01.16]** ๐ŸŒŸ Our **MathVista** paper has been accepted for an **Oral** presentation at **ICLR 2024** (only top 85 out of over 7200 submissions)! ๐ŸŽ‰ Cheers! - **[2023.12.21]** ๐Ÿš€ [Qwen-VL-Plus](https://github.com/QwenLM/Qwen-VL) achieves **43.3%**, establishing itself as the best-performing one in open-sourced models. ๐ŸŽ‰ Congratulations! - **[2023.12.08]** ๐Ÿ” We've updated the leaderboard and radar graphs with the **fine-grained scores** of the **Gemini** family models. Thanks to the Gemini Team and Google for providing us with these results! ๐Ÿ‘ - **[2023.12.06]** ๐Ÿš€ Google's newly released multimodal model, [Gemini](https://blog.google/technology/ai/google-gemini-ai/), shows impressive abilities on **MathVista**, achieving a new SOTA performance with **50.3%**! ๐ŸŽ‰ Cheers!! - **[2023.11.17]** ๐ŸŒŸ Congratulations to [SPHINX (V2)](https://github.com/Alpha-VLLM/LLaMA2-Accessory/tree/main/SPHINX), which is now the SOTA open-source multimodal model on **MathVista**, reaching **36.7%**. ๐Ÿ‘ - **[2023.10.25]** ๐Ÿš€ Dive into our comprehensive **112-page** evaluation of **GPT-4V**, Bard, and other Large Multimodal Models, encompassing both **quantitative** and **qualitative** insights. [Explore the full paper now!](https://arxiv.org/abs/2310.02255) ๐Ÿ“„โœจ - **[2023.10.16]** ๐Ÿ” We are working on a comparative study on the **GPT-4V** model. Stay tuned for the detailed report! ๐Ÿ“‘. - **[2023.10.15]** We finished the manual evaluation of **GPT-4V** with the playground chatbot on the *testmini* set on **MathVista**. ๐Ÿš€ GPT-4V achieves a substantial gain of **15.1%** โฌ†๏ธ over Bard, reaching a new record of **49.9%**! ๐ŸŽ‰ - **[2023.10.15]** Our dataset is now accessible at [Huggingface Datasets](https://huggingface.co/datasets/AI4Math/MathVista). - **[2023.10.15]** Our dataset is now accessible at [Paper With Code](https://paperswithcode.com/dataset/mathvista). - **[2023.10.03]** The top-performing model, ๐ŸŽญ **Multimodal Bard**, achieved a score of **34.8%** on the *testmini* set for **MathVista** ๐Ÿ“Š. - **[2023.10.03]** Our work was featured by [Aran Komatsuzaki](https://twitter.com/arankomatsuzaki) on [Twitter](https://twitter.com/arankomatsuzaki/status/1709380140717809992). Thanks! -->

๐Ÿ‘€ About MOSSBench

Humans are prone to cognitive distortions โ€” biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced MLLMs exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts.

<p align="center"> <img src="website/static/images/main_plot.png" width="70%"> <br> Overview of <b>MOSSBench</b>. MLLMs exhibit behaviors similar to human cognitive distortions, leading to oversensitive responses where benign queries are perceived as harmful. We discover that oversensitivity prevails among existing MLLMs. </p>

As the initial step in investigating this behavior, we identify three types of stimulus that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark Logo (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT).

<p align="center"> <img src="website/static/images/stimuli.jpg" width="70%"> <br> Three types of stimuli in <b>MOSSBench</b>. </p>

Empirical studies using Logo MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages โ€” perception, intent reasoning, and safety decision-making โ€” in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications.

For more details, you can find our project page here and our paper here.

๐Ÿ† Leaderboard ๐Ÿ†

Contributing the Leaderboard

๐Ÿšจ๐Ÿšจ The leaderboard is continuously being updated.

The evaluation instructions are available at ๐Ÿ”ฎ Evaluations on MOSSBench and ๐Ÿ“ Evaluation Scripts of Our Models.

To submit your results to the leaderboard, please send to this email with your result file (we will generate the score file for you), referring to the template file below:

Oversensitivity on MOSSBench

Refusal Rate of mllms:

#ModelAvailabilityDateALLExaggerated RiskNegated HarmCounterintuitive Interpretation
1Claude 3 Opus (web)Proprietary MLLMs - Web version2024-06-2270.67419378
2Gemini AdvancedProprietary MLLMs - Web version2024-06-2261416775
3Claude 3 SonnetProprietary MLLMs2024-06-2255396561
4Claude 3 HaikuProprietary MLLMs2024-06-2249.33275863
5Claude 3 OpusProprietary MLLMs2024-06-2234.67114355
6Gemini Pro 1.5Proprietary MLLMs2024-06-2229.33252835
7Qwen-VL-ChatOpen-source MLLMs2024-06-2221.67161336
8InternLM-Xcomposer2-7bOpen-source MLLMs2024-06-2217.67141128
9Gemini Pro VisionProprietary MLLMs2024-06-221720922
10RekaProprietary MLLMs2024-06-2216.67112118
11InstructBLIP-Vicuna-7bOpen-source MLLMs2024-06-2215.6721233
12IDEFICS-9b-InstructOpen-source MLLMs2024-06-2213.6717915
13MiniCPM-V 2.0Open-source MLLMs2024-06-2212.33161110
14LlaVA-1.5-7bOpen-source MLLMs2024-06-2212.3318109
15mPLUG-Owl2Open-source MLLMs2024-06-221011712
16LlaVA-1.5-13bOpen-source MLLMs2024-06-229.679911
17GPT-4oProprietary MLLMs2024-06-226.33685
18MiniCPM-Llama3-V 2.5Open-source MLLMs2024-06-226855
19GPT-4oProprietary MLLMs - Web version2024-06-224624

๐Ÿ“Š Dataset Examples

Examples of 3 types of oversensitivity stimuli:

  1. Exaggerated Risk
<img src="website/static/images/results-examples/Exaggerated.png" style="zoom:20%;" />
  1. Negated Harm
<img src="website/static/images/results-examples/Negated.png" style="zoom:20%;" />
  1. Counterintuitive Interpretation
<img src="website/static/images/results-examples/Counterintuitive.png" style="zoom:20%;" />

๐Ÿ“– Dataset Usage

Data Downloading

You can download this dataset by the following command (make sure that you have installed Huggingface Datasets):

from datasets import load_dataset

dataset = load_dataset("AIcell/MOSSBench", "oversensitivity")

Here are some examples of how to access the downloaded dataset:

# print the first example on the testmini set
print(dataset["train"][0])
print(dataset["train"][0]['pid']) # print the problem id 
print(dataset["train"][0]['question']) # print the question text 
print(dataset["train"][0]['image']) # print the image path
dataset["train"][0]['decoded_image'] # display the image

Data Format

The dataset is provided in json format and contains the following attributes:

{
    "image": [string] A file path pointing to the associated image,
    "short description": [string] An oracle short description of the associated image,
    "question": [string] A query regarding to the image, 
    "pid": [string] Problem ID, e.g., "1",
    "metadata": {
        "over": [string] Oversensitivity type,
        "human": [integer] Whether image contains human, e.g. 0 or 1,
        "child": [integer] Whether image contains child, e.g. 0 or 1,
        "syn": [integer] Whether image is synthesized, e.g. 0 or 1,
        "ocr": [integer] Whether image contains ocr, e.g. 0 or 1,
        "harm": [integer] Which harm type the query belongs to, 0-7,
    }
}

Data Visualization

๐ŸŽฐ You can explore the dataset in an interactive way here.

๐Ÿ”ฎ Evaluations on MOSSBench

Requirements

Install the Python dependencies if you would like to reproduce our results for ChatGPT, GPT-4, Claude-2, and Bard:

pip install -r requirements.txt

Evaluation Pipelines

Step 1. Prepare your MLLM

For proprietary MLLMs

Get your models API ready in following links

and store them under foler path_to_your_code/api_keys/[model].text. Please replace the [model] by anthropic_keys, google_keys and openai_keys.

For open-source MLLMs

Download your model or get their names for Huggingface. And replace the following path by where you locate your models or your models name.


# Initialize variables
MODEL_NAME="your_path_to/idefics-9b-instruct" # you can replace it by direct naming
DATA_DIR=""

Step 2. Run evaluation (main.py) Next, run experiments/main.py file in folder or excute the .sh files we provide for evaluation by

cd experiments/scripts

bash run_instructblip.sh

๐Ÿ“œ License

The new contributions to our dataset are distributed under the CC BY-SA 4.0 license, including

:coffee: Stay Connected!

We are always open to engaging discussions, collaborations, or even just sharing a virtual coffee. To get in touch or join our team, visit TurningPoint AI's homepage for contact information.

:white_check_mark: Cite

If you find MOSSBench useful for your your research and applications, please kindly cite using this BibTeX:

@misc{li2024mossbenchmultimodallanguagemodel,
      title={MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?}, 
      author={Xirui Li and Hengguang Zhou and Ruochen Wang and Tianyi Zhou and Minhao Cheng and Cho-Jui Hsieh},
      year={2024},
      eprint={2406.17806},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.17806}, 
}

MOSSBench Website

MOSSBench website is adapted from Nerfies website and MathVista website.

Website License

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.