Home

Awesome

HEAD-QA

NEWS! HEAD-QA can be now imported from huggingface datasets. Thank you very much to Maria Grandury for adding it.

This repository contains the sources used in "HEAD-QA: A Healthcare Dataset for Complex Reasoning" (ACL, 2019)

HEAD-QA is a multi-choice HEAlthcare Dataset. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. They are designed by the Ministerio de Sanidad, Consumo y Bienestar Social, who also provides direct access to the exams of the last 5 years (in Spanish).

Date of the last update of the documents object of the reuse: January, 14th, 2019.

HEAD-QA tries to make these questions accessible for the Natural Language Processing community. We hope it is an useful resource towards achieving better QA systems. The dataset contains questions about the following topics:

Requirements

Requirements for the ARC-Solvers

Installation

We first recommend you to install a virtualenv in the first place (e.g. virtualenv -p python3.6 head-qa) The script install.sh automatically installs the mentioned packages, assuming that you have previously created and activated your virtualenv (tested on Ubuntu 18.04, 64 bits). The script install_arc_solvers.sh install the needed stuff to run the ARC-solvers (Clark et al,2019).

We recommend using a different virtualenv for them as stuff such as the pytorch version might create conflicts.

Datasets

ES_HEAD dataset EN_HEAD dataset Each dataset contains:

Data (images, pdfs, etc). Note that these are medical images and some of them might have sensitive content.

Run the baselines: Length, Random, Blind_n, IR and DrQA

Available baselines for Spanish HEAD-QA: Length, Random, Blind_n, IR- Available baselines for English HEAD-QA (HEAD-QA_EN): Length, Random, Blind_n, IR, DrQA-

Description of the baselines:

Creating an inverted index

IR and DrQA require to create an inverted index in advance. This is done using wikiextractor and following DrQa's Document Reader guidelines (visit their README.md for a detailed explanation about how to create the index, we here summarize the main steps):

In this work we used the following Wikipedia dumps:

Alternative, you can try to use the current Wikipedia dump maintained by https://dumps.wikimedia.org/

PYTHONPATH="$HOME/git/wikiextractor" python $HOME/git/wikiextractor/WikiExtractor.py $PATH_WIKIPEDIA_DUMP -o $PATH_WIKI_JSON --json
PYTHONPATH="$HOME/git/DrQA/" python $HOME/git/DrQA/scripts/retriever/build_db.py $PATH_WIKI_JSON $PATH_DB
PYTHONPATH="$HOME/git/DrQA/" python $HOME/git/DrQA/scripts/retriever/build_tfidf.py --num-workers 2 $PATH_DB $PATH_TFIDF

The created model in $PATH_TFIDF it's what will be used as our inverted index. If they are of any help, the indexes we used in our work can be found here.

Updating DrQA's tokenizer

By default, DrQA uses the CoreNLP tokenizer. In this work we used the SpacyTokenizer instead. To use it, go to DrQA/drqa/pipeline/__init__.py and make sure you use the DEFAULT below these lines. Also, we used multitask.mdl as the reader_model. Make sure you have downloaded it when you installed DrQA.

from ..tokenizers import CoreNLPTokenizer, SpacyTokenizer

DEFAULTS = {
    'tokenizer': SpacyTokenizer,#CoreNLPTokenizer,
    'ranker': TfidfDocRanker,
    'db': DocDB,
    'reader_model': os.path.join(DATA_DIR, 'reader/multitask.mdl'),
}

Create a configuration file

#A configuration file for Spanish

lang=es
eval=eval.py
#Path to your DrQA's installation
drqa=DrQA/ 
use_stopwords=False
ignore_questions=False 
negative_questions=False 
#The folder containing the .gold files
path_solutions=HEAD/ 

es_head=HEAD/HEAD.json #HEAD-QA in json format
#The inverted index that we have previously created.
es_retriever=wikipedia//home/david.vilares/Escritorio/proof-head-qa-code/head-qa/wikipedia/eswiki-20180620-articles.tfidf 

After this, you should be abl to run the script run.py:

python run.py --config configs/configuration$LANG.config --answerer $ANSWERER --output $OUTPUT

Running the ARC-solvers

We also run the ARC-Solvers used in the ARC challenge (Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., & Tafjord, O. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge.). To install and run them follow these steps:

1- Follow the ARC-solvers README.md instructions to create a virtualenv, create the index and download the models and resources:

NOTE that instead of using their ARC_corpus.txt as the inverted index we used again Wikipedia. If you also want to use Wikipedia you need to do two things:

  1. Make sure you have downloaded our Wikipedia corpus in txt format.
  2. Modify the file ARC-Solvers/scripts/download_data.sh and change the argument specifying the corpus ARC_corpus.txt to the path where you have stored the Wikipedia corpus.

NOTE 2 ARC-Solvers need of elasticsearch 6+ to download the data. Download it and to run it execute.

cd elasticsearch-<version>
./bin/elasticsearch  

2 - Convert HEAD_EN.json into the input format for the ARC solvers

PYTHONPATH=. python scripts/head2ARCformat.py --input HEAD_EN/HEAD_EN.json --output HEAD_ARC/

3 - Run the models using the evaluation scripts provided together with the ARC solvers:

cd ARC-Solvers
sh scripts/evaluate_solver.sh ../HEAD_ARC/HEAD_EN.arc.txt data/ARC-V1-Models-Aug2018/dgem/
sh scripts/evaluate_solver.sh ../HEAD_ARC/HEAD_EN.arc.txt data/ARC-V1-Models-Aug2018/decompatt/
sh scripts/evaluate_bidaf.sh ../HEAD_ARC/HEAD_EN.arc.txt data/ARC-V1-Models-Aug2018/bidaf/

4 - Compute the scores for HEAD-QA, based on the ARC-solvers outputs

cd ..
python evaluate_arc_solvers.py --arc_results $PATH_RESULTS --output $PATH_OUTPUT_DIR --disambiguator length --breakdown_results --path_eval eval.py

where:

Issues

We had problems running some models, being unable to find the question-tuplizer.jar used in the ARC-solvers. If you experience this error Error: Unable to access jarfile data/ARC-V1-Models-Feb2018/question-tuplizer.jar we recommend you to change in the file scripts/evaluate_solver.sh the line: java -Xmx8G -jar data/ARC-V1-Models-Feb2018/question-tuplizer.jar by java -Xmx8G -jar data/ARC-V1-Models-Aug2018/question-tuplizer.jar

We also had problems ruuning the dgem baseline. The default torch version that is installed if you follow the instructions in the ARC-solvers README.md is the 0.4.1. To make them work we needed to install torch 0.3.1 instead.

Acknowledgements

This work has received funding from the European Research Council (ERC), under the European Union's Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150).

References

Vilares, David and Gómez-Rodríguez, Carlos. "HEAD-QA: A Healthcare Dataset for Complex Reasoning", to appear, ACL 2019.