Awesome
<!-- ![EditEval logo](./logo.png | width = 100) --> <p align="center"> <img src="./logo.png" width="700"> </p>The Instruction-Based Benchmark for Text Improvements
The EditEval benchmark is described in the following paper: https://arxiv.org/abs/2209.13331
@inproceedings{dwivedi-edit-2022,
doi = {10.48550/ARXIV.2209.13331},
url = {https://arxiv.org/abs/2209.13331},
author = {Dwivedi-Yu, Jane and Schick, Timo and Jiang, Zhengbao and Lomeli, Maria and Lewis, Patrick and Izacard, Gautier and Grave, Edouard and Riedel, Sebastian and Petroni, Fabio},
keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {EditEval: An Instruction-Based Benchmark for Text Improvements},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}
Leaderboard
The leaderboard for this benchmark can be found on EvalAI.
Installation
conda create -n editeval -y python=3.7 && conda activate editeval
pip install -e .
Additional dependencies
The FRUIT dataset requires that you install gsutil.
Downloading datasets
This will download to the directory /data. To specify a different output directory use output_directory={path_to_output_dir}
.
For a single dataset run:
python main.py --dataset_name {dataset_name}
For all datasets run:
python main.py --dataset_name all
Writing datasets to jsonl files
For a single dataset run:
python main.py --dataset_name {dataset_name} --write_to_jsonl
For all datasets run:
python main.py --dataset_name all --write_to_jsonl
Sampling datasets
python main.py --dataset_name jfleg --sample {num_examples_to_sample}
Running evaluation for a dataset
python main.py --dataset_name {dataset_name} --prediction_file {path_to_jsonl}
To specify certain metrics (e.g., gleu and sari):
python main.py --dataset_name {dataset_name} --prediction_file {path_to_jsonl} --metrics gleu sari
To turn off normalization during evaluation, specify --no_normalization
.
Current tasks and datasets
- Fluency
- jfleg
- iterater_fluency
- Clarity
- iterater_clarity
- Coherence
- iterater_coherence
- Paraphrasing
- stsb_multi_mt
- Simplification
- turk
- asset
- Neutralization
- wnc
- Updating
- fruit
- wafer_insert
Current metrics
- sari
- em
- em_diff
- bleu
- ibleu
- gleu
- rouge
- update_rouge
- bert_score
Licensing
See our LICENSE file for licensing details.