Awesome

<h1 align="center"> <img src="images/hobby.png" alt="PNG Image" width="25" height="25"> Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images</h1> <div align="center">

Zhiyuan Li · Heng Wang · Dongnan Liu · Chaoyi Zhang · Ao Ma · Jieting Long · Weidong Cai

School of Computer Science, The University of Sydney

</div> [<a href="https://arxiv.org/abs/2408.08105">ArXiv</a>] | [<a href="https://huggingface.co/datasets/Pinkygin/MuCR">🤗HuggingFace</a>] | [<a href="https://mucr-benchmark.github.io/">Website</a>]

MuCR is proposed to challenge VLLMs to infer semantic cause-and-effect relationships when solely relying on visual cues such as action, appearance, clothing, and environment.

Release

[2024/08/15] 🔥 We release the complete dataset.
[2024/08/15] 🔥 We release the arxiv paper.
[2024/08/15] 🔥 We launch the project page.

Demos

Overview

Model Performance

Detailed Examples

Download

You can directly download the model from Huggingface. or load dataset from Huggingface as follows:

import datasets
dataset = datasets.load_dataset("data/")

Dataset Form

Each line of file in jsonl must meet the following format:

{
  "id": "ID",
  "caption_0": "...",
  "caption_1": "...",
  "link_id": "[a,b,c]",
  "cue": "cue",
  "false_cue": ["false_cue1","false_cue2","flase_cue3"],
  "style": "style",
  "label": "label",
  "causal_reason": ["Explanation_1", "Explanation_2", "Explanation_3"],
  "image_0": "cause image",
  "image_1": "effect image"
}

Reference

If you find this project useful for your research, please consider citing the following paper:

@article{li2024multimodal,
  title={Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images},
  author={Li, Zhiyuan and Wang, Heng and Liu, Dongnan and Zhang, Chaoyi and Ma, Ao and Long, Jieting and Cai, Weidong},
  journal={arXiv preprint arXiv:2408.08105},
  year={2024}
}