Awesome
Moved
The QA-SRL Bank 2.0, client library, and other related resources are now all maintained in one place, at julianmichael/qasrl.
The QA-SRL Bank
This repository is the reference point for QA-SRL Bank 2.0, the dataset described in the paper Large-Scale QA-SRL Parsing.
The data may be downloaded here or you can clone this
repository and run ./download.sh
.
Contents
When you run ./download.sh
, the dataset will be downloaded and expanded into the data/qasrl-v2/
directory. Its contents are as follows:
data/qasrl-v2
:orig/
: The original data gathered on MTurk, where workers wrote the questions.expanded/
: The expanded dataset with model-generated questions and answers gathered in our expansion round. Train and dev only.dense/
: The densely annotated data, combining theexpanded
data with extra model-generated questions and judgments from turkers on a 5k-sentence subset of dev and test.index.json.gz
: An index of the documents that were used across all partitions, with metadata.
If you are modeling the data, you will probably be using orig
or expanded
for training and
tuning, and orig
and dense
for evaluation.
Metadata is included in each set allowing you to determine which round a question or answer judgment
originated from.
See the Data Format description for details on how the data files are laid out.
Using the QA-SRL Bank
Once you have downloaded it, you can use your favorite JSON parsing or data reading library to process and iterate through it. However, there are some options already available:
- If you're using Python (or particularly AllenNLP), you can use the dataset reading code from our model.
- If you're using Scala, we have a client library.
- If you're using something else and write your own, please contribute it (or a reference to it)!