Home

Awesome

<div align=center> <img src="assets/memvrlogo.png" width="270px"> </div> <h2 align="center"> <a href="https://arxiv.org/abs/2410.03577">Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

</a></h2>

<h5 align="center"> If you like our project, please give us a star ⭐ on GitHub for the latest update.</h5> <h5 align=center>

hf arXiv License Hits

</h5>

πŸ“£ News

🎯 Overview

We propose Memory-Space Visual Retracing (MemVR), a novel hallucination mitigation paradigm without needing external knowledge retrieval or additional fine-tuning. MemVR has two significant advantages:

MemVR

MemVR

<div align="center"> <strong>It’s a game-changer for effectiveness and efficiency.</strong> </div>

Comprehensive experimental evaluations demonstrate that MemVR significantly mitigates hallucination issues across various MLLMs and excels in general benchmarks without incurring added time overhead.

πŸ•ΉοΈ Usage

Installation

  1. We recommend you use LLaVA as the working environment. Please clone the repository from LLaVA and set up the environment by running
git clone https://github.com/haotian-liu/LLaVA
cd LLaVA
conda create -n memvr python==3.10
conda activate memvr
pip install --upgrade pip
pip install -e .
  1. After setting up, clone the repository from MemVR and move all contents to the main directory of LLaVA (except README.md).
LLaVA/
β”œβ”€β”€ llava/
β”‚ β”œβ”€β”€ eval/ # merge here in the next step
β”‚ β”œβ”€β”€ .../
β”œβ”€β”€ eval_scripts/
β”‚ β”œβ”€β”€ llava/
β”‚ β”œβ”€β”€ qwen/
β”‚ β”œβ”€β”€ glm/
β”œβ”€β”€ memvr.py/
β”œβ”€β”€ inference.py/
β”œβ”€β”€ images/
β”‚ β”œβ”€β”€ ...
└── ...

Then merge the file eval to the directory

/LLaVA/llava/eval/

Downloading Checkpoints

Under the main directory of LLaVA:

  1. Download the checkpoint of LLaVA v1.5 here.
  2. Download the checkpoint of Qwen-VL-Chat here. Replace the downloaded 'modeling_qwen.py' by modeling_qwen.py to enable MemVR on Qwen-VL-Chat model.
  3. Download the checkpoint of glm-4v-9b here. Replace the downloaded 'modeling_chatglm.py' by modeling_chatglm.py to enable MemVR on GLM-4V-9b model.

You may check if your environment works fine by running

python inference.py

Evaluation

Follow Evaluation.md in LLaVA to prepare for the benchmark materials. Additionally, we recommend you use GPUs with no less than 40GB of VRAM. Test with these benchmarks by running

bash eval_scripts/llava/mme.sh 

Please note that you may need to fill in your own OpenAI API-KEY for GPT-based evaluations like llavabench or MM-Vet.

Here are some tips of the parameters in the scripts:

    --retracing-ratio 0.12 \
    --entropy-threshold 0.75 \
    --starting-layer 5 \
    --ending-layer 16 \

Where

πŸ… Experiments

MemVR Figure 5. Results on MMBench. MemVR enhances comprehensive performance on diverse tasks.

πŸ“Œ Examples

Case1 Figure 9. Visualization of uncertainty across layers without and with MemVR. MemVR effectively reduces uncertainty after the 8th layer, contributing to hallucination mitigations.

Case2 Figure 10. A case study in long text generation. MemVR effectively mitigates hallucinations.

✏️ Citation

If you find this paper useful, please consider staring 🌟 this repo and citing πŸ“‘ our paper:

@article{zou2024memvr,
  title={Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models}, 
  author={Xin Zou and Yizhou Wang and Yibo Yan and Sirui Huang and Kening Zheng and Junkai Chen and Chang Tang and Xuming Hu},
  journal={arxiv preprint arxiv:2410.03577},
  year={2024}
}

πŸ“ Related Projects

Star History

Star History Chart