Awesome
ZeroSCROLLS
This repository contains code to run inference on the ZeroSCROLLS benchmark.
Setup
- Install torch
- Install transformers 4.30.2
- pip install -r requirements.txt
Load the data
- via 🤗 Datasets (huggingface/datasets) library (recommended):
from datasets import load_dataset
gov_report = load_dataset("tau/zero_scrolls", "gov_report", split="test")
"""
Options are: ["gov_report", "summ_screen_fd", "qmsum", "squality", "qasper","narrative_qa", "quality", "musique", "space_digest","book_sum_sort"]
There is also a small number of examples (~20 per task) in a "validation" split, meant for eyeballing purposes
"""
- via ZIP files, where each split is in a JSONL file:
Inference with Huggingface models
python experiments/hf/run_hf_model.py --model-name=google/flan-t5-small
Supported models:
- google/flan-t5-small
- google/flan-t5-base
- google/flan-t5-large
- google/flan-t5-xl
- google/flan-t5-xxl
- google/flan-ul2
- bigscience/T0pp
To add new models:
- Add them to
model_to_max_input_tokens
in experiments/hf/run_hf_model.py - Make sure to load them with the appropriate architecture (i.e. modify the model initialization from T5ForConditionalGeneration in the same file, if needed)
Inference with APIs
To run with models used in the paper*:
# if you want to use openai models
export OPENAI_API_KEY=<insert token here>
export OPENAI_ORG=<insert org here>
# if you want to use anthropic models
export ANTHROPIC_API_KEY=<insert token here>
# if you want to limit the number of examples to run per task
export MAX_EXAMPLES=10
python experiments/api/run_api_model.py --model_name=gpt-3.5-turbo --limit_to_n_examples=$MAX_EXAMPLES
*These models and APIs tend to update, see the paper for the versions used in the baselines.
Models supported:
- text-davinci-003
- gpt-3.5-turbo
- gpt-4
- claude-v1
To add new a new API, you need to:
- Implement a new class the inherits from APIRunner.
- Working examples for OpenAI and Anthropic APIs can be found in openai_api.py and anthropic_api.py
When using a prompt that includes opening XML tags, (e.g. "... Assistant: <answer>"), ensure that you post-process the generations to retain only the prefix before the closing XML tag generated by the model before submitting.
Prepare submission
To create a CSV file in the correct format for a leaderboard submission we recommend using our conversion script, prepare_submission.py.
Its inputs:
For each task, the predictions should be in a JSON file that is a mapping from an ID to a textual prediction:
{
"example_id1": "prediction1",
"example_id2": "prediction2",
...
}
Please set:
{dataset_name}_PREDS_FILE
to be the path to a JSON file in the format above containing your predictions for{dataset_name}
.OUTPUT_DIR
to be the path you want the submission file will be saved to.
Run:
python submission/prepare_submission.py \
--gov_report_file GOV_REPORT_PREDS_FILE \
--summ_screen_fd_file SUMM_SCREEN_FD_PREDS_FILE \
--qmsum_file QMSUM_PREDS_FILE \
--squality_file SQUALITY_PREDS_FILE \
--qasper_file QASPER_PREDS_FILE \
--narrative_qa_file NARRATIVE_QA_PREDS_FILE \
--quality_file QUALITY_PREDS_FILE \
--musique_file MUSIQUE_PREDS_FILE \
--space_digest_file SPACE_DIGEST_PREDS_FILE \
--book_sum_sort_file BOOK_SUM_SORT_PREDS_FILE \
--output_dir OUTPUT_DIR
Verify your submission file
Run:
python submission/verify_submission.py \
--all_predictions SUBMMISION_FILE \
--output_dir OUTPUT_DIR
A valid submission file will result in the following line printed:
The verification was successful.
Please fix any errors before making your submission.
Leaderboard
The live leaderboard is here.
Citation
@inproceedings{shaham-etal-2023-zeroscrolls,
title = "{Z}ero{SCROLLS}: A Zero-Shot Benchmark for Long Text Understanding",
author = "Shaham, Uri and
Ivgi, Maor and
Efrat, Avia and
Berant, Jonathan and
Levy, Omer",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.536",
doi = "10.18653/v1/2023.findings-emnlp.536",
pages = "7977--7989"
}
If you find the ZeroSCROLLS data useful, please make sure to cite also the original dataset papers: [bibtex]