Awesome
WinoGrande
Version 1.1
Data
Download dataset by download_winogrande.sh
./data/
├── train_[xs,s,m,l,xl].jsonl # training set with differnt sizes
├── train_[xs,s,m,l,xl]-labels.lst # answer labels for training sets
├── dev.jsonl # development set
├── dev-labels.lst # answer labels for development set
├── test.jsonl # test set
├── sample-submissions-labels.lst # example submission file for leaderboard
└── eval.py # evaluation script
You can use train_*.jsonl
for training models and dev
for validation.
Please note that labels are not included in test.jsonl
. To evaluate your models on test
set, make a submission to our leaderboard.
Run experiments
Setup
- Download dataset by
download_winogrande.sh
pip install -r requirements.txt
Training (fine-tuning)
-
You can train your model by
./scripts/run_experiment.py
(seesample_training.sh
).e.g., export PYTHONPATH=$PYTHONPATH:$(pwd) python scripts/run_experiment.py \ --model_type roberta_mc \ --model_name_or_path roberta-large \ --task_name winogrande \ --do_eval \ --do_lower_case \ --data_dir ./data \ --max_seq_length 80 \ --per_gpu_eval_batch_size 4 \ --per_gpu_train_batch_size 16 \ --learning_rate 1e-5 \ --num_train_epochs 3 \ --output_dir ./output/models/ \ --do_train \ --logging_steps 4752 \ --save_steps 4750 \ --seed 42 \ --data_cache_dir ./output/cache/ \ --warmup_pct 0.1 \ --evaluate_during_training
-
If you have an access to beaker, you can run your experiments by
sh ./train_winogrande_on_bkr.sh
. -
Results will be stored under
./output/models/
.
Prediction (on the test set)
-
You can make predictions by
./scripts/run_experiment.py
directly (seesample_prediction.sh
).e.g., export PYTHONPATH=$PYTHONPATH:$(pwd) python scripts/run_experiment.py \ --model_type roberta_mc \ --model_name_or_path .output/models \ --task_name winogrande \ --do_predict \ --do_lower_case \ --data_dir ./data \ --max_seq_length 80 \ --per_gpu_eval_batch_size 4 \ --output_dir ./output/models/ \ --data_cache_dir ./output/cache/ \
-
If you have an access to beaker, you can run your experiments by
sh ./predict_winogrande_on_bkr.sh
. -
Result is stored in
./output/models/predictions_test.lst
Evaluation
You can use eval.py
for evaluation on the dev split, which yields metrics.json
.
e.g., python eval.py --preds_file ./YOUR_PREDICTIONS.lst --labels_file ./dev-labels.lst
In the prediction file, each line consists of the predictions (1 or 2) by 5 training sets (ordered by xs
, s
, m
, l
, xl
, separated by comma) for each evauation set question.
2,1,1,1,1
1,1,2,2,2
1,1,1,1,1
.........
.........
Namely, the first column is the predictions by a model trained/finetuned on train_xs.jsonl
, followed by a model prediction by train_s.jsonl
, ... , and the last (fifth) column is the predictions by a model from train_xl.jsonl
.
Please checkout a sample submission file (sample-submission-labels.lst
) for reference.
Submission to Leaderboard
You can submit your predictions on test
set to the leaderboard.
The submission file must be named as predictions.lst
. The format is the same as above.
Reference
If you use this dataset, please cite the following paper:
@article{sakaguchi2019winogrande,
title={WinoGrande: An Adversarial Winograd Schema Challenge at Scale},
author={Sakaguchi, Keisuke and Bras, Ronan Le and Bhagavatula, Chandra and Choi, Yejin},
journal={arXiv preprint arXiv:1907.10641},
year={2019}
}
License
Winogrande (codebase) is licensed under the Apache License 2.0. The dataset is licensed under CC-BY.
Questions?
Please file GitHub issues with your questions/suggestions. You may also ask us questions at our google group.
Contact
Email: keisukes[at]allenai.org