Home

Awesome

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

This repository contains the official code of the paper: "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies", accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2021.

Citation

@article{geva2021strategyqa,
  title = {{Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies}},
  author = {Geva, Mor and Khashabi, Daniel and Segal, Elad and Khot, Tushar and Roth, Dan and Berant, Jonathan},
  journal = {Transactions of the Association for Computational Linguistics (TACL)},
  year = {2021},
}

Following are instructions to reproduce the experiments reported in the paper, on the StrategyQA dataset.


Quick Links

  1. Setup
  2. Training
  3. Prediction and Evaluation
  4. Download Links to Our Trained Models

Setup

Requirements

Our experiments were conducted in a Python 3.7 environment. To clone the repository and set up the environment, please run the following commands:

git clone https://github.com/eladsegal/strategyqa.git
cd strategyqa
pip install -r requirements.txt

StrategyQA dataset files

The official StrategyQA dataset files with a detailed description of their format can be found on the dataset page.
To train our baseline models, we created a 90%/10% random split of the official train set to get an unofficial train/dev split: data/strategyqa/[train/dev].json.

(Optional) Creating an Elasticsearch index of our corpus

Download link to our full corpus of Wikipedia paragraphs is available on the dataset page. A script for indexing the paragraphs into Elasticsearch is available here.


Training

RoBERTa*

RoBERTa* is a RoBERTa model fine-tuned on auxiliary datasets that we used as our base model when fine-tuning on StrategyQA. We trained RoBERTa* as follows:

  1. Download twentyquestions dataset and extract it to data/, so you have data/twentyquestions/twentyquestions-[train/dev].jsonl.

  2. Download BoolQ dataset and extract it to data/, so you have data/boolq/[train/dev].jsonl.

  3.  python run_scripts/train_RoBERTa_STAR.py -s OUTPUT_DIR -g "GPU"
    

    A trained RoBERTa* model can be found here.

Question Answering Models

The directory configs/strategy_qa containes configuration files for the question answering models described in the paper. To train a question answering model of a specific configuration, run the train.py script as follows:

python run_scripts/train.py --config-file configs/strategy_qa/CONFIG_NAME.jsonnet -s OUTPUT_DIR -g "GPU" -w [path to a RoBERTa* model (.tar.gz file)]

A trained model for each configuration can be found in https://storage.googleapis.com/ai2i/strategyqa/models/CONFIG_NAME.tar.gz,
and evaluation scores for it on the used dev set (Setup) can be found in https://storage.googleapis.com/ai2i/strategyqa/models/CONFIG_NAME.json.

Figures depicting the resource dependency of the training procedures can be found here.

<img src="graphs/graphs_small.png">

Notes:

Question Decomposition Model (BART-Decomp)

  1. Train the model:

    python run_scripts/train.py --config-file configs/decomposition/bart_decomp_strategy_qa.jsonnet -s OUTPUT_DIR -g "GPU"
    

    A trained model can be found here.

  2. Output predictions:

    python run_scripts/predict.py --model [path to a BART-Decomp model (.tar.gz file)] --data data/strategyqa/dev.json -g "GPU" --output-file data/strategyqa/generated/bart_decomp_dev_predictions.jsonl
    

Iterative Answering of Decompositions

  1. Download BoolQ dataset and extract it to data/, so you have data/boolq/[train/dev].jsonl.

  2. Download SQuAD 2.0 dataset and extract it to data/, so you have data/squad_v2/[train/dev]-v2.0.json.

  3. Append BoolQ to SQuAD:

    python -m tools.squadify_boolq data/boolq/train.jsonl data/squad/squad_v2_boolq_dataset_train.json --append-to data/squad/train-v2.0.json
    
    python -m tools.squadify_boolq data/boolq/dev.jsonl data/squad/squad_v2_boolq_dataset_dev.json --append-to data/squad/dev-v2.0.json
    
  4. Train a RoBERTa Extractive QA model on SQuAD and BoolQ:

    python run_scripts/train.py --config-file configs/squad/transformer_qa_large.jsonnet -s OUTPUT_DIR -g "GPU"
    

    A trained model can be found here.

  5. Replace the placeholders in the gold decomposition:

    python -m src.models.iterative.run_model -g [GPU (single only)] --qa-model-path ../experiments/publish/transformer_qa_large.tar.gz --paragraphs-source ORA-P --data data/strategyqa/train.json --output-predictions-file data/strategyqa/generated/transformer_qa_ORA-P_train_no_placeholders.json
    
    python -m src.models.iterative.run_model -g [GPU (single only)] --qa-model-path ../experiments/publish/transformer_qa_large.tar.gz --paragraphs-source ORA-P --data data/strategyqa/dev.json --output-predictions-file data/strategyqa/generated/transformer_qa_ORA-P_dev_no_placeholders.json
    

    This script allows for different paragraphs sources to be used (IR-Q/ORA-P/IR-ORA-D/IR-D), and can also work on generated decompositions instead of the gold ones (use --generated-decompositions-paths).


Prediction and Evaluation

The StrategyQA leaderboard is available here.

The official evaluation script can be found here.

Question Answering

Notes:

Recall@10

  1. Outputs the retrieved paragraphs for the configuration.
    The format is a dictionary with "qid" as a key and a list of paragraph IDs as the value.

    python ir_evaluation/get_paragraphs_by_config.py --config-file configs/CONFIG_NAME.jsonnet --output-file OUTPUT_PATH --data DATA_PATH
    
  2. python ir_evaluation/recall@10.py --data DATA_PATH --retrieved-paragraphs [OUTPUT_PATH from the previous step]
    

Download Links to Our Trained Models