Awesome
Awesome MLLM Hallucination <!-- omit in toc -->
Hallucination of Multimodal Large Language Models: A Survey
:star: News! We have released a comprehensive survey of MLLM hallucination.
<p align="center"> <img src="assets/tax.png" alt="TAX" style="display: block; margin: 0 auto;" /> </p>This is a repository for organizing papres, codes and other resources related to hallucination of Multimodal Large Language Models (MLLM), or called Large Vision-Language Models (LVLM).
Hallucination in LLM usually refers to the phenomenon that the generated content is nonsensical or unfaithful to the provided source content, such as violation of input instruction, or containing factual errors, etc. In the context of MLLM, hallucination refers to the phenomenon that the generated text is semantically coherent but inconsistent with the given visual content. The community has been constantly making progress on analyzing, detecting, and mitigating hallucination in MLLM.
:books: How to read?
The main contribution of a specific paper is proposing either a new hallucination benchmark (metric) or proposing a hallucination mitigation method. The analysis and detection of hallucination are only part of the whole paper, serving as the basis of evaluation and mitigation. Therefore, we divide the papers into two categories: **hallucination evaluation & analysis ** and hallucination mitigation. In each category, the paper are listd in an order from new to old. Note that there might be some duplicated papers in the two categories. Those papers contain both evaluation benchmark and mitigation method.
:high_brightness: This project is still on-going, pull requests are welcomed!!
If you have any suggestions (missing papers, new papers, key researchers or typos), please feel free to edit and pull a request. Just letting us know the title of papers can also be a great contribution to us. You can do this by open issue or contact us directly via email.
:star: If you find this repo useful, please star it!!!
Table of Contents <!-- omit in toc -->
Hallucination Survey
- Hallucination of Multimodal Large Language Models: A Survey (Apr. 30, 2024)
Hallucination Evaluation & Analysis
-
EventHallusion Diagnosing Event Hallucinations in Video LLMs (Sep. 25, 2024)
-
FIHA Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs (Sep. 20, 2024)
-
QL-Bench Explore the Hallucination on Low-level Perception for MLLMs (Sep. 15, 2024)
-
ODE Open-Set Evaluation of Hallucinations in Multimodal Large Language Models (Sep. 14, 2024)
-
Pfram Understanding Multimodal Hallucination with Parameter-Free Representation Alignment (Sep. 02, 2024)
-
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data (Aug. 30, 2024)
-
Reefknot Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models (Aug. 18, 2024)
-
Hallu-PI Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs (Aug. 02, 2024)
-
HaloQuest HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning (Jul. 22, 2024)
-
ROPE Multi-Object Hallucination in Vision-Language Models (Jul. 08, 2024)
-
BEAF BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models (Jun. 18, 2024)
-
VideoHallucer VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models (Jun. 24, 2024)
-
HQHBench Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models (Jun. 24, 2024)
-
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? (Jun. 20, 2024)
-
VGA VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning (Jun. 20, 2024)
-
Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? (Jun. 18, 2024)
-
MFC-Bench MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models (Jun. 17, 2024)
-
AutoHallusion AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models (Jun. 16, 2024)
-
Med-HallMark Detecting and Evaluating Medical Hallucinations in Large Vision Language Models (Jun. 14, 2024)
-
MetaToken MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification (May. 29, 2024)
-
THRONE THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models (May. 08, 2024, CVPR 2024)
-
VALOR-EVAL VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (Apr. 22, 2024)
-
ALOHa ALOHa: A New Measure for Hallucination in Captioning Models (Apr. 03, 2024, NAACL 2024)
-
UPD Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models (Mar. 29, 2024)
-
IllusionVQA IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models (Mar. 23, 2024)
-
CounterAnimal Do CLIPs Always Generalize Better than ImageNet Models ? (Mar. 18, 2024)
-
PhD PhD: A Prompted Visual Hallucination Evaluation Dataset (Mar. 17, 2024)
-
AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models (Mar. 13, 2024)
-
Hal-Eval Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models (Feb. 24, 2024)
-
VHTest Visual Hallucinations of Multi-modal Large Language Models (Feb. 22, 2024)
-
MAD-Bench How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts (Feb. 20, 2024)
-
MHaluBench Unified Hallucination Detection for Multimodal Large Language Models (Feb. 20, 2024)
-
VQAv2-IDK Visually Dehallucinative Instruction Generation: Know What You Don't Know (Feb. 15, 2024)
-
CorrelationQA The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs (Feb. 06, 2024)
-
MMVP Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs (Jan. 11, 2024)
-
MOCHa (OpenCHAIR) MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations (Dec. 06, 2023)
-
FGHE Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites (Dec. 04, 2023)
-
MERLIM Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models (Dec. 03, 2023)
-
CCEval HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (Dec. 03, 2023)
-
HallusionBench HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models (Nov. 28, 2023)
-
RAH-Bench Mitigating Hallucination in Visual Language Models with Visual Supervision (Nov. 27, 2023)
-
AMBER An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation (Nov. 13, 2023)
-
Bingo Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges (Nov. 7, 2023)
-
FAITHSCORE FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models (Nov. 2, 2023)
-
HaELM Evaluation and Analysis of Hallucination in Large Vision-Language Models (Oct. 10, 2023)
-
NOPE Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models (Oct. 9, 2023)
-
LRV (GAVIE) Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning (Sep., 29 2023)
-
MMHal-Bench Aligning Large Multimodal Models with Factually Augmented RLHF (Sep. 25, 2023)
-
CIEM CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning (NeurlPS Workshop)
-
POPE Evaluating Object Hallucination in Large Vision-Language Models (EMNLP 2023)
-
CHAIR Object Hallucination in Image Captioning (EMNLP 2018)
Hallucination Mitigation
-
MemVR Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models (Oct. 7, 2024)
-
HELPD Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding (Sep. 30, 2024)
-
Dentist A Unified Hallucination Mitigation Framework for Large Vision-Language Models (Sep. 22, TMLR, 2024)
-
PACU Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization (Sep. 22, 2024)
-
RBD Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding (Sep. 10, 2024)
-
MVP Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning (Aug. 30, 2024)
-
ConVis Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models (Aug. 25, 2024)
-
CLIP-DPO Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs (Aug. 19, 2024)
-
SID Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models (Aug. 04, 2024)
-
ARA Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation (Aug. 01, 2024)
-
PAI Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs (Jul. 31, 2024)
-
MAD Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate (Jul. 30, 2024)
-
VACoDe VACoDe: Visual Augmented Contrastive Decoding (Jul. 26, 2024)
-
REVERIE Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models (Jul. 16, 2024)
-
BACON BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations (Jul. 03, 2024)
-
Pelican Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification (Jul. 02, 2024)
-
MMHalSnowball Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (Jun. 30, 2024)
-
AGLA AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention (Jun. 18, 2024)
-
MedThink MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More (Jun. 17, 2024)
-
TUNA Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags (Jun. 16, 2024)
-
CODE CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models (Jun. 04, 2024)
-
NoiseBoost NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models (May. 30, 2024)
-
RITUAL RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs (May. 28, 2024)
-
HALVA Mitigating Object Hallucination via Data Augmented Contrastive Tuning (May. 28, 2024)
-
AvisC Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models (May. 28, 2024)
-
RLAIF-V RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness (May. 27, 2024)
-
HIO Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization (May. 24, 2024)
-
VDGD VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap (May. 24, 2024)
-
VFC Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation (Apr. 30, 2024 (CVPR 2024))
-
SoM-LLaVA List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs (Apr. 25, 2024)
-
Cantor Cantor: Inspiring Multimodal Chain-of-Thought of MLLM (Apr. 24, 2024)
-
HSA-DPO Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback (Apr. 22, 2024)
-
FACT Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales (Apr. 17, 2024)
-
SeVa Self-Supervised Visual Preference Alignment (Apr. 16, 2024)
-
DFTG Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning (Apr. 16, 2024)
-
FGAIF FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback (Apr. 07, 2024)
-
ICD Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding (ACL 2024, Mar. 27, 2024)
-
ESREAL Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models (Mar. 24, 2024)
-
Pensieve Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination (Mar. 21, 2024)
-
M3ID Multi-Modal Hallucination Control by Visual Information Grounding (Mar. 20, 2024)
-
DVP What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models (Mar. 20, 2024)
-
AIT Mitigating Dialogue Hallucination for Large Multi-modal Models via Adversarial Instruction Tuning (Mar. 15, 2024)
-
HALC HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding (Mar. 01, 2024)
-
IBD IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding (Feb. 28, 2024)
-
CGD Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding (Feb. 23, 2024)
-
Less is More Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (Feb. 22, 2024)
-
LogicCheckGPT Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models (Feb. 18, 2024)
-
POVID Aligning Modalities in Vision Large Language Models via Preference Fine-tuning (Feb. 18, 2024)
-
EFUF EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models (Feb. 15, 2024)
-
IDK-Instruction Visually Dehallucinative Instruction Generation: Know What You Don't Know (Feb. 15, 2024)
-
MARINE Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance (Feb. 13, 2024)
-
Skip \n Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models (Feb. 12, 2024)
-
ViGoR ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling (Feb. 09, 2024)
-
LAR-LAF Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study (Jan. 31, 2024)
-
Silkie Silkie: Preference Distillation for Large Visual Language Models (Dec. 17, 2023)
-
HACL Hallucination Augmented Contrastive Learning for Multimodal Large Language Model (Dec. 12, 2023)
-
MOCHa (OpenCHAIR) MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations (Dec. 06, 2023)
-
FGHE Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites (Dec. 04, 2023)
-
HallE-Switch HallE-Switch: Controlling Object Hallucination in Large Vision Language Models (Dec. 03, 2023)
-
RLHF-V RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback (Dec. 01, 2023)
-
OPERA OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation (Nov. 29, 2023)
-
VCD Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding (Nov. 28, 2023)
-
HA-DPO Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization (Nov. 28, 2023)
-
RAH-Bench Mitigating Hallucination in Visual Language Models with Visual Supervision (Nov. 27, 2023)
-
HalluciDoctor HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Nov. 22, 2023)
-
Volcano Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision (Nov. 14, 2023)
-
Woodpecker Woodpecker: Hallucination Correction for Multimodal Large Language Models (Oct. 24, 2023)
-
LURE Analyzing and Mitigating Object Hallucination in Large Vision-Language Models (Oct. 1, 2023)
-
LRV-Instruction Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning (Sep. 29, 2023)
-
LLaVA-RLHF Aligning Large Multimodal Models with Factually Augmented RLHF (Sep. 25, 2023)
-
VIGC VIGC: Visual Instruction Generation and Correction (Sep. 11, 2023)
-
HalDectect Detecting and Preventing Hallucinations in Large Vision Language Models (Aug. 18, 2023)