Home

Awesome

HELP: A Dataset for Handling Entailments with Lexical and Logical Phenomena (Ver.1.0)

Overview

The HELP dataset is an automatically created natural language inference (NLI) dataset that embodies the combination of lexical and logical inferences focusing on monotonicity (i.e., phrase replacement-based reasoning). The HELP (Ver.1.0) has 36K inference pairs consisting of upward monotone, downward monotone, non-monotone, conjunction, and disjunction.

HELP dataset

output_en/
pmb_train_v1.0.tsv

Replicating the HELP

If you would like to replicate the HELP dataset, try the following procedure:

Environment

git clone https://github.com/verypluming/HELP.git
cd HELP
pyenv virtualenv 3.4.6 help
pyenv activate help
pip install -r requirements.txt
python -c "import nltk; nltk.download('wordnet')"

Installing C&C parser and Parallel Meaning Bank (PMB)

Please download C&C, set it up, and create a file data/parser_location.txt with the path to the C&C parser. Then, please download PMB version 2.1.0 here and put it to data/ directry.

echo "candc:/path/to/candc-1.00/" > data/parser_location.txt

Data creation

python scripts/create_dataset_PMB.py

Citation

If you use this dataset in any published research, please cite the following:

@InProceedings{yanaka-EtAl:2019:starsem,
  author    = {Yanaka, Hitomi and Mineshima, Koji  and  Bekki, Daisuke and Inui, Kentaro and Sekine, Satoshi and Abzianidze, Lasha and Bos, Johan},
  title     = {HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning},
  booktitle = {Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM2019)},
  year      = {2019},
}

Contact

For questions and usage issues, please contact hitomi.yanaka@riken.jp .

License

CC BY-SA 4.0

Acknowledgement

This work is conducted in collaboration with RIKEN, Ochanomizu University, and University of Groningen. We thank the Parallel Meaning Bank (PMB).