

ProtoQA Dataset

This repository contains the dataset for ProtoQA ("Family Feud"). See the paper for details on dataset creation.

Data Files:

Each line is a json dictionary, in which:

For a full description of the data format, see DATAFORMAT.md.

File organization:


This repository contains a data statement (based on Datasheets for Datasets (Gebru et al. 2020) and earlier NLP-specific work (Bender and Friedman 2018)) to provide transparency in data use and encourage others to do so. This is a preliminary version of the statement; please post issues in the repository or contact the authors if you have questions regarding the data details or suggestions regarding the dataset use.