Awesome

DROP Explorer

This is a data explorer for the dataset DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs.

Available for immediate use at http://eladsegal.github.io/DROP-explorer.

Dataset files are available here.
Prediction files examples are available here.
For DROP, use the standardized dataset file here for correct alignment.
For Quoref, use the DROPified dataset file here.

DROP Explorer screenshot

Things to know

Clicking on a question row will highlight a gold answer and the predicted answer.
Multi-span answers are sorted
The first answer displayed is the one in "answer", the rest are from "validated_answers" and only distinct answers are displayed.
Be aware that heads with some names are handled in a special way:
- Predictions from "counting" head won't be highlighted
- Predictions from "arithmetic" head are expected to have a member named "numbers" in "answer" which is an array.

Expected Predictions File Format

The expected predictions file format is JSONL, where each line is a JSON object that is the output_dict of an instance prediction. The following members will be used and are required unless mentioned otherwise (they don't have to be correct, but just have the correct type):

passage_id: string
query_id: string
answer: A JSON object with the following members:
- value: The final prediction - A string or an array of strings
- spans (optional): An array of arrays of the form ["p" (passage) / "q" (question), start_index, end_index (exclusive)] used to make spans that the model used for prediction bold.
- numbers (required and used only when the head is "arithmetic"): An array of objects of the form {"value": number, "sign": -1/0/-1} to construct the arithmetic expression used to arrive at the answer
predicted_ability: The name of the head used for prediction
maximizing_ground_truth: The answer for which the highest EM and F1 scores were calculated, in the same format of an answer in the dataset.
em (optional): The EM score calculated, a number
f1 (optional): The F1 score calculated, a number
max_passage_length (optional): The length of the passage that was considered for the model prediction, used to show which parts of the passage were truncated