Awesome
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution
<p align="center"><img src="https://github.com/project-miracl/hagrid/blob/main/assets/icon.png?raw=true" alt="HAGRID" width="20%"><br> </p> <p align="center"> <a href="https://www.python.org/"> <img alt="Build" src="https://img.shields.io/badge/Made%20with-Python-1f425f.svg?color=purple"> </a> <a href="https://github.com/project-miracl/hagrid/blob/master/LICENSE"> <img alt="License" src="https://img.shields.io/github/license/project-miracl/hagrid"> </a> <a href="https://arxiv.org/abs/2307.16883"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2307.16883-b31b1b.svg"> </a> </p>HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) is a dataset for generative information-seeking scenarios. It is constructed on top of MIRACL 🌍🙌🌏, an information retrieval dataset that consists of queries along with a set of manually labelled relevant passages (quotes).
We collect attributed explanations for each question by eliciting prompts from GPT-3.5, based on the given relevant passages. The explanations adhere to an in-context citation style, similar to scientific articles, that reference the supporting quotes. We then ask human annotators to judge the explanations based on two criteria:
- Informativeness: whether they provide a direct answer to the question.
- Attributability: whether they are attributable to the source passages.
Quick Links
Data
HAGRID is hosted on Hugging Face 🤗: link.
import datasets
hagrid = datasets.load_dataset("miracl/hagrid", split="train")
print(hagrid[0])
Split | #Q | #A | #Informativeness | #Attribuatability |
---|---|---|---|---|
Train | 1,922 | 3,214 | 3,214 | 754 |
Dev | 716 | 1,318 | 1,157 | 826 |
Baselines (Coming soon!)
We are planning to release baseline models soon! Stay tuned!
Contact
If you have any questions, feel free to email us (project.miracl [at] gmail.com) or start a Github issue under this repository.
License
This work is licensed under the Apache 2 license. See LICENSE for details.
Citation
If you find this dataset and repository helpful, please cite HAGRID as follows:
@article{hagrid,
title={{HAGRID}: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution},
author={Ehsan Kamalloo and Aref Jafari and Xinyu Zhang and Nandan Thakur and Jimmy Lin},
year={2023},
journal={arXiv:2307.16883},
}