Home

Awesome

FACTOR

This repo contains data from AI21 Labs' paper Generating Benchmarks for Factuality Evaluation of Language Models.

Data

We include the following FACTOR benchmarks for evaluating factuality of language models:

Evaluation

Setup

To install the required libraries in our repo, run:

pip install -r requirements.txt

To have a Pytorch version specific to your CUDA, install your version before running the above command.

List of Language Models

In the paper, we give the results for the following models (replace $MODEL_NAME with one of those).

Evaluation Script

To run evaluation on models over FACTOR datasets, please use the following command:

python python eval_factuality.py \
--data_file ./data/wiki_factor.csv \
--output_folder $OUTPUT_DIR \
--model_name $MODEL_NAME

License

Citation

If you find our paper or code helpful, please cite our paper:

@article{muhlgay2023generating,
  title={Generating benchmarks for factuality evaluation of language models},
  author={Muhlgay, Dor and Ram, Ori and Magar, Inbal and Levine, Yoav and Ratner, Nir and Belinkov, Yonatan and Abend, Omri and Leyton-Brown, Kevin and Shashua, Amnon and Shoham, Yoav},
  journal={arXiv preprint arXiv:2307.06908},
  year={2023}
}