Home

Awesome

FaithScore

made-with-python arxiv PyPI version faithscore Downloads

This is the official release accompanying our paper, FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models . FAITHSCORE is available as a PIP package as well.

If you find FAITHSCORE useful, please cite:

@inproceedings{jing-etal-2024-faithscore,
    title = "{F}aith{S}core: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models",
    author = "Jing, Liqiang  and
      Li, Ruosen  and
      Chen, Yunmo  and
      Du, Xinya",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.290",
    pages = "5042--5063"
}

Process

FAITHSCORE process

Install

  1. Install LLaVA 1.5.

    Note that you don't need to download parameter weights of LLaVA 1.5 if you use OFA to realize fact verification.

  2. Install modelscope;

    pip install modelscope
    pip install "modelscope[multi-modal]" 
    
  3. Install our package.

    pip install faithscore==0.0.9
    

Running FaithScore using Pip Package

You can evaluate answers generated by large vision-language models via our metric.

from faithscore.framework import FaithScore

images = ["./COCO_val2014_000000164255.jpg"]
answers = ["The main object in the image is a colorful beach umbrella."]

scorer = FaithScore(vem_type="...", api_key="...", llava_path=".../llava/eval/checkpoints/llava-v1.5-13b", use_llama=False,
                   llama_path="...llama2/llama-2-7b-hf")
score, sentence_score = scorer.faithscore(answers, images)

Parameters for FaithScore class:

Running FaithScore using a Command Line

You can also evaluate answers generated by the following command line.

python run.py --answer_path {answer_path} --openai_key {openai_key} --vem_type {vem_type} --llava_path {llava_path} --llama_path {llama_path} --use_llama {use_llama}

Parameters:

Data

Annotation Data

The data is given in a json format file. For example,

{"id": "000000525439", "answer": "The skateboard is positioned on a ramp, with the skateboarder standing on it.", "stage 1": {"The skateboard is positioned on a ramp": 1, " with the skateboarder standing on it": 1}, "stage 2": {"There is a skateboard.": 1, "There is a ramp.": 0, "There is a skateboarder.": 1, "The skateboarder is standing on a skateboard.": 0}}

Data Format:

You can download our annotation dataset.

Automatic Evaluation Benchmarks

You can download our automatic evaluation benchmarks.

Leaderboard

Public LVLM leaderboard computed on LLaVA-1k.

ModelFaithScoreSentence-level Faithscore
Multimodal-GPT0.530.49
MiniGPT-40.570.65
mPLUG-Owl0.720.70
InstructBLIP0.810.72
LLaVA0.840.73
LLaVA-1.50.860.77

Public LLM leaderboard computed on MSCOCO-Cap.

ModelFaithScoreSentence-level Faithscore
Multimodal-GPT0.540.63
MiniGPT-40.640.60
mPLUG-Owl0.850.67
InstructBLIP0.940.80
LLaVA0.870.64
LLaVA-1.50.940.83