Home

Awesome

WinoWhy

This is the github repo for ACL 2020 paper "WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge".

Dependency

Python 3.6, Pytorch 1.1

Introduction of WinoWhy

This repo include the original Winograd Schema Challenge (WSC) dataset and 4095 WinoWhy reasons (15 for each WSC question) that could justify the pronoun coreference choices in WSC.

WinoWhy contains 3 sources of reasons: (1) Human; (2) Human Reverse; (3) Generation Model. Each WSC reason has 5 reasons from each source.

Here are the descriptions and examples of reasons from these sources. The examples are based on the WSC question: "The city councilmen refused the demonstrators a permit because they feared violence. Does the 'they' refer to 'the city councilmen' or 'the demonstrators'?". The reasons are based on the question "The 'they' refers to the city councilmen because...". The paired question of this WSC changes "feared" to "advocated".

ResourceDescriptionExample
HumanReasons provided by human beings.city councilmen are administrative so they are more likely to fear.
Human ReverseHuman reasons for the paired WSC question.the demonstrators were the ones who needed a permit.
Generation ModelThe reasons generated by GPT-2 with the same question.they are under the command of Mayor James B. Gray.

Collected reasons are then used to categorize what types of commonsense knowledge are needed to solve the WSC question. The selected knowledge types are as follows (notice that a question could require knowledge from multiple categories):

Name (# of question)DefinitionExample
Property (32)Knowledge about property of objects.ice is cold.
Object (82)Knowledge about objects.cats have ears.
Eventuality (88)Knowledge about eventualities.'wake up' happens before 'open eyes'.
Spatial (64)Knowledge about spatial position.object at the back can be blocked.
Quantity (20)Knowledge about numbers.2 is smaller than 10.
Others (48)All other knowledge.NA

Data Format of WinoWhy

There are two data files in the repo:

winowhy.json: the WSC dataset and corresponding WinoWhy questions.

cat_ref.json: the knowledge categories and indexes of corresponding WSC questions.

WinoWhy Dataset

Datatset: a list of 273 WSC questions.

    WSC Question: a dictionary. The keys: values are:

        "text": a dictionary of the orginal WSC text. The keys: values are:

            "txt1": a string of text before the pronoun;

            "pron": a string of the target pronoun;

            "txt2": a string of text after the pronoun;

        "answers": a list of strings of candidate answer spans;

        "correctAnswer": A or B;

        "source": original wsc source;

        "reasons": a list of the WinoWhy reasons:

            WinoWhy Reason: a list of a reason info:

                reason[0]: reason text;

                reason[1]: reason source (human, gpt, reverse);

                reason[2]: reason plausibility;

                reason[3]: reason label (Valid, Invalid, Undecided)

Category Reference

Dataset: a dictionary. The key: values are:

    "Property", "Object", "Eventuality", "Spatial", "Quantity", "Others": a list of the indexes of the WSC questions.

Performance of unsupervised models on WinoWhy

We can first connect the question and the reason as a single sentence by adding a few words between them (e.g., WSC Question+" The 'they' refers to the city councilmen because "+ Reason). Then we can put the sentence into the models and take the returned probability as the prediction.

Performance of supervised models on WinoWhy

Similarly, we can regard WinoWhy as a binary classification problem which requires the model to distinguish the valid/invalid reasons through supervised learning. You can run the code for supervised learning by python supervised_winowhy.py. A processed dataset removing the reasons with label undecided for classification is available in ./dataset/.

Citation

@inproceedings{zhang2020WinoWhy,
  author    = {Hongming Zhang and Xinran Zhao and Yangqiu Song},
  title     = {WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge},
  booktitle = {Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL) 2020},
  year      = {2020}
}

Others

If you have any other questions about this repo, you are welcome to open an issue or send me an email, I will respond to that as soon as possible.