Home

Awesome

Single-hop Reading Comprehension Model

This code is for the following paper:

Sewon Min*, Eric Wallace*, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, Luke Zettlemoyer. Compositional Questions Do Not Necessitate Multi-hop Reasoning In: Proceedings of ACL (short). Florence, Italy. 2019.

@inproceedings{ min2019compositional,
    title = { Compositional Questions Do Not Necessitate Multi-hop Reasoning },
    author = { Min, Sewon and Wallace, Eric and Singh, Sameer and Gardner, Matt and Hajishirzi, Hannaneh and Zettlemoyer, Luke },
    booktitle = { ACL },
    year = { 2019 }
}

This is a general-purpose reading comprehension model based on BERT, which takes a set of paragraphs as an input but is incapable of cross-paragraph reasoning.

model_diagram

This is primarily for HotpotQA. However, for any task which input is a question and (one or more) paragraphs and the output is the answer (span from the paragraph/yes/no) to the question, you can use this code.

For any question, please contact Sewon Min and Eric Wallace.

Acknowledgement

This code is based on a PyTorch version of Google's pretrained BERT model, from an earlier version of Hugging Face's PyTorch BERT.

Requirements

Preprocessing

  1. Download Pretrained BERT and Convert to PyTorch

There are multiple BERT models: BERT-Base Uncased, BERT-Large Uncased, BERT-Base Cased and BERT-Large Cased. This code is tested on BERT-Base Uncased. Using the larger model may improve results.

First, download the pre-trained BERT Tensorflow models from here. This is coverted from Google, uncased base version. Please unzip this zip file and rename the directory to bert.

  1. Convert HotpotQA into SQuAD style

To use the same model in all the different datasets, we convert datasets into the SQuAD format. To run on HotpotQA, create a directory and download the training and validation sets into the directory. Then run:

python convert_hotpot2squad.py --data_dir PATH_TO_DATA_DIR --task hotpot-all

You can try different task setting using --task flag. (Please see the code for the details)

Training

The main files used for training are:

To train the model,

python main.py --do_train --output_dir out/hotpot \
          --train_file PATH_TO_TRAIN_FILE \
          --predict_file PATH_TO_DEV_FILE \
          --init_checkpoint PATH_TO_PRETRAINED_BERT \
          --bert_config_file PATH_TO_BERT_CONFIG_FILE \
          --vocab_file PATH_TO_BERT_VOCAB_FILE

Make sure PATH_TO_TRAIN_FILE and PATH_TO_DEV_FILE are set to the output from the convert_hotpot2squad.py script (usually data/hotpot-all/train.json). This code will store the best model in out/hotpot/best-model.pt.

To make an inference,

python main.py --do_predict --output_dir out/hotpot \
        --predict_file PATH_TO_DEV_FILE \
        --init_checkpoint out/hotpot/best-model.pt \
        --predict_batch_size 32 --max_seq_length 300 --prefix dev_

This will store dev_predictions.json and dev_nbest_predictions.json into the out/hotpot directory (prefix is the prefix of the files to store.).

PREFIX_predictions.json is a dictionary with the example id as the keys and the value is the prediction of the model and the ground-truth answer. PREFIX_nbest_predictions.json is the same except has the value as the top-k predictions of the model, as well as the logit values, probability values, no-answer value, and the evidence (the paragraph that the answer is from). You can also adjust which values to store in evaluate_qa.py.

Other potentially useful flags:

When you are doing --do_predict, if you want to make an inference using an ensemble of models, you can specify several model paths to --init_checkpoint: --init_checkpoint out/hotpot1/best-model.pt,out/hotpot2/best-model.pt. Then, the code will make an inference of all models and do voting to get the final output.

Similarly, if you want to combine 2+ data for training and inference, you can specify them. e.g., If you want to train SQuAD and HotpotQA jointly, you can add --train_file SQUAD_TRAIN_FILE,HOTPOT_TRAIN_FILE --predict_file SQUAD_DEV_FILE,HOTPOT_DEV_FILE, then the code will train and test the model with combined data.