Home

Awesome

CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge

This repository contains the data and code for the baseline described in the following paper:

CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge<br/> Yasumasa Onoe, Michael J.Q. Zhang, Eunsol Choi, Greg Durrett<br/> NeurIPS 2021 Datasets and Benchmarks Track

@article{onoe2021creak,
  title={CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge},
  author={Onoe, Yasumasa and Zhang, Michael J.Q. and Choi, Eunsol and Durrett, Greg},
  journal={OpenReview},
  year={2021}
}

***** [New] November 8th, 2021: The contrast set has been updated. *****

We have increased the size of the contrast set to 500 examples. Please check the paper for new numbers.

Datasets

Examples

Exampls

Data Files

CREAK data files are located under data/creak.

The data files are formatted as jsonlines. Here is a single training example:

{
    'ex_id': 'train_1423',
    'sentence': 'Lauryn Hill separates two valleys as it is located between them.',
    'explanation': 'Lauren Hill is actually a person and not a mountain.',
    'label': 'false',
    'entity': 'Lauryn Hill',
    'en_wiki_pageid': '162864',
    'entity_mention_loc': [[0, 11]]
}
FieldDescription
ex_idExample ID
sentenceClaim
explanationExplanation by the annotator why the claim is TRUE/FALSE
labelLabel: 'true' or 'false'
entitySeed entity
en_wiki_pageidEnglish Wikipedia Page ID for the seed entity
entity_mention_locLocation(s) of the seed entity in the claim

Baselines

See this README

Leaderboards

https://www.cs.utexas.edu/~yasumasa/creak/leaderboard.html

We host results only for Closed-Book methods that have been finetuned on only In-Domain data.

To submit your results, please send your system name and prediction files for the dev, test, and contrast sets to yasumasa@utexas.edu.

Contact

Please contact at yasumasa@utexas.edu if you have any questions.