


A benchmark for understanding and evaluating rationales: http://www.eraserbenchmark.com/

Core Files

The core files are utils and metrics. These two files comprise everything you need to work with our released datasets.

utils documents everything you need to know about our input formats. Output formats and validation code are covered in metrics.


At the moment we offer two forms of pipeline models:

(Lehman, et al., 2019) Pipeline

To run this model, we need to first:

Then we can run (as an example):

PYTHONPATH=./:$PYTHONPATH python rationale_benchmark/models/pipeline/pipeline_train.py --data_dir data/movies --output_dir output/movies --model_params params/movies.json
PYTHONPATH=./:$PYTHONPATH python rationale_benchmark/metrics.py --split test --data_dir data/movies --results output/movies/test_decoded.jsonl --score_file output/movies/test_scores.json

BERT-To-BERT Pipeline

To run this model, instructions are effectively the same as the simple pipeline above, except we also require a GPU with approximately 16G of memory (e.g. Tesla V100). The same caveats about batch sizes apply here as well.

Then we can run (as an example):

PYTHONPATH=./:$PYTHONPATH python rationale_benchmark/models/pipeline/bert_pipeline.py --data_dir data/movies --output_dir output_bert/movies --model_params param/movies.json
PYTHONPATH=./:$PYTHONPATH python rationale_benchmark/metrics.py --split test --data_dir data/movies --results output_bert/movies/test_decoded.jsonl --score_file output_bert/movies/test_scores.json

For more examples, see the BERT-to-BERT reproduction.

More models including Lei et al can be found at : https://github.com/successar/Eraser-Benchmark-Baseline-Models