Home

Awesome

FAVA

made-with-python

Intro

FAVA is a hallucination detection and editing model. You can find a model demo here, model weights here and our datasets here. This repo includes information on synthetic data generation for training and evaluating FAVA.

<p align="center"><img src="https://github.com/abhika-m/FAVA/blob/main/fava.png" alt="FAVA" width="500"/></p>

Overview

  1. Installation
  2. Synthetic Data Generation
  3. Postprocess Data for Training
  4. Retrieval Guide
  5. FActScore Evaluations
  6. Fine Grained Sentence Detection Evaluations

Install

conda create -n fava python=3.9
conda activate fava
pip install -r requirements.txt
python -m spacy download en_core_web_sm

Training

Step 1: Synthetic Data Generation

Our synthetic data generation takes in wikipedia passages and a title, diversifies the passage to another genre of text and then inserts errors one by one using ChatGPT and GPT-4.

Running Data Generation

cd training
python generate_train_data.py \
--input_file {input_file_path} \
--output_file {output_file_path} \
--openai_key {your_openai_key}

Input file is jsonl and includes:

Output file includes:

Step 2: Process Training Data

Post Processing

cd training
python process_train_data.py \
--input_file {input_file_path} \
--output_file {output_file_path}

Input file is json and includes:

Output file includes:

Step 3: Training

We followed Open-Instruct's training script for training FAVA. We updated and ran this script updating the train_file to our processed training data from step 2 and used Llama-2-Chat 7B as our base model.

You can find our training data here.

Retrieval Guide

We use Contriever to retrieve documents.

Step 1: Download data

Download the preprocessed passage data and the generated passaged (Contriever-MSMARCO).

cd retrieval
wget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
wget https://dl.fbaipublicfiles.com/contriever/embeddings/contriever-msmarco/wikipedia_embeddings.tar

Step 2: Collect Retrieved Passages

We retrieve the top 5 documents but you may adjust num_docs as per your liking.

cd retrieval
python passage_retrieval.py \
    --model_name_or_path facebook/contriever-msmarco --passages psgs_w100.tsv \
    --passages_embeddings "wikipedia_embeddings/*" \
    --data {input_file_path}  \
    --output_dir {output_file_path} \
    --n_docs {num_docs}

Input file is either a json or jsonl and includes:

Evaluations

We provide two main evaluation set ups: FActScore and our own fine grained error detection task.

FActScore

cd eval
python run_eval --model_name_or_path {model_name_or_path} --input_file {input_file_path} --output_file {output_file_path} --metric factscore --openai_key {your_openai_key}

Input file is json and includes:

FActScore dataset can be downloaded from here. We used the the Alpaca 7B, Alpaca 13B, and ChatGPT data from FActScore.

Fine Grained Sentence Detection

cd eval
python run_eval --model_name_or_path {model_name_or_path} --input_file {input_file_path} --output_file {output_file_path} --metric detection

Input file is json and includes:

You can find our human annotation data here.

Optional flags:

Citation

@article{mishra2024finegrained,
    title={ Fine-grained Hallucinations Detections },
    author={ Mishra, Abhika and Asai, Akari and Balachandran, Vidhisha and Wang, Yizhong and Neubig, Graham and Tsvetkov, Yulia and Hajishirzi, Hannaneh },
    journal={arXiv preprint},
    year={ 2024 },
    url={ https://arxiv.org/abs/2401.06855 }
}