Awesome
ARC-Solvers
Library of baseline solvers for AI2 Reasoning Challenge (ARC) Set (http://data.allenai.org/arc/). These solvers retrieve relevant sentences from a large text corpus (ARC_Corpus.txt in the dataset), and use two types of models to predict the correct answer.
- An entailment-based model that computes the entailment score for each
(retrieved sentence, question+answer choice as an assertion)
pair and scores each answer choice based on the highest-scoring sentence. - A reading comprehension model (BiDAF) that converts the retrieved sentences into a paragraph per question. The model is used to predict the best answer span and each answer choice is scored based on the overlap with the predicted span.
Setup environment
- Create the
arc_solvers
environment using Anaconda
conda create -n arc_solvers python=3.6
- Activate the environment
source activate arc_solvers
- Install the requirements in the environment:
sh scripts/install_requirements.sh
- Install pytorch as per instructions on http://pytorch.org/. Command as of Feb. 26, 2018:
conda install pytorch torchvision -c pytorch
Setup data/models
- Download the data and models into
data/
folder. This will also build the ElasticSearch index (assumes ElasticSearch 6+ is running onES_HOST
machine defined in the script)
sh scripts/download_data.sh
- Download and prepare embeddings. This will download glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/ and convert it to glove.840B.300d.txt.gz which is readable from AllenNLP
sh download_and_prepare_glove.sh
Running baseline models
Run the entailment-based baseline solvers against a question set using scripts/evaluate_solver.sh
Running a pre-trained DGEM model
For example, to evaluate the DGEM model on the Challenge Set, run:
sh scripts/evaluate_solver.sh \
data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
data/ARC-V1-Models-Aug2018/dgem/
Change dgem
to decompatt
to test the Decomposable Attention model.
Running a pre-trained BiDAF model
To evaluate the BiDAF model, use the evaluate_bidaf.sh
script
sh scripts/evaluate_bidaf.sh \
data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
data/ARC-V1-Models-Aug2018/bidaf/
Training and evaluating the BiLSTM Max-out with Question to Choices Max Attention
This model implements an attention interaction between the context-encoded representations of the question and the choices. The model is described here.
To train the model, download the data and word embeddings (see Setup data/models above).
Evaluate the trained model:
python arc_solvers/run.py evaluate \
--archive_file data/ARC-V1-Models-Aug2018/max_att/model.tar.gz \
--evaluation_data_file data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl
or
Train a new model:
python arc_solvers/run.py train \
-s trained_models/qa_multi_question_to_choices/serialization/ \
arc_solvers/training_config/qa/multi_choice/reader_qa_multi_choice_max_att_ARC_Chellenge_full.json
Running against a new question set
To run the baseline solvers against a new question set, create a file using the JSONL format. For example:
{
"id":"Mercury_SC_415702",
"question": {
"stem":"George wants to warm his hands quickly by rubbing them. Which skin surface will
produce the most heat?",
"choices":[
{"text":"dry palms","label":"A"},
{"text":"wet palms","label":"B"},
{"text":"palms covered with oil","label":"C"},
{"text":"palms covered with lotion","label":"D"}
]
},
"answerKey":"A"
}
Run the evaluation scripts on this new file using the same commands as above.
Running a new Entailment-based model
To run a new entailment model (implemented using AllenNLP), you need to
-
Create a
Predictor
that converts the input JSON to anInstance
expected by your entailment model. See DecompAttPredictor for an example. -
Add your custom predictor to the predictor overrides For example, if your new model is registered using
my_awesome_model
and the predictor is registered usingmy_awesome_predictor
, add"my_awesome_model": "my_awesome_predictor"
to thepredictor_overrides
. -
Run the
evaluate_solver.sh
script with your learned model inmy_awesome_model/model.tar.gz
:
sh scripts/evaluate_solver.sh \
data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
my_awesome_model/
Running a new Reading Comprehension model
To run a new reading comprehension (RC) model (implemented using AllenNLP), you need to
-
Create a
Predictor
that converts the input JSON to anInstance
expected by your RC model. See BidafQaPredictor for an example. -
Add your custom predictor to the predictor overrides For example, if your new model is registered using
my_awesome_model
and the predictor is registered usingmy_awesome_predictor
, add"my_awesome_model": "my_awesome_predictor"
to thepredictor_overrides
. -
Run the
evaluate_bidaf.sh
script with your learned model inmy_awesome_model/model.tar.gz
:
sh scripts/evaluate_solver.sh \
data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
my_awesome_model/