Home

Awesome

A Discrete Hard EM Approach for Weakly Supervised Question Answering

This is the original implementation of the following paper.

Sewon Min, Danqi Chen, Hannaneh Hajishirzi, Luke Zettlemoyer. A Discrete Hard EM Approach for Weakly Supervised Question Answering. In: Proceedings of EMNLP (long). 2019

@inproceedings{ min2019discrete,
  title={ A Discrete Hard EM Approach for Weakly Supervised Question Answering },
  author={ Min, Sewon and Chen, Danqi and Hajishirzi, Hannaneh and Zettlemoyer, Luke },
  booktitle={ EMNLP },
  year={ 2019 }
}

You can use hard EM updates for any weakly-supervised QA task where a precomputed solution set can be obtained, and can use with any model architecture. This is an example code for open-domain question answering using BERT QA model. The base code is from Huggingface's Pytorch-Transformers.

Codes for other tasks are coming soon, stay tuned!

In the paper, we experiment on six QA datasets in three different categories.

Below is the results reported in the paper (all on the test set).

DatasetTriviaQANarrativeQATriviaQA-openNaturalQuestions-openDROPWikiSQL
First-only64.957.448.123.642.9-
MML65.556.147.425.839.770.5
Hard-EM (Ours)67.158.850.928.152.883.9
SOTA71.454.747.126.543.874.8

SOTA from Wang et al 2018, Nishida et al 2019, Lee et al 2019, Lee et al 2019, Dua et al and Agarwal et al 2019, respectively.

Quick Run on open-domain QA

python 3.5
PyTorch 1.1.0

Download Data and BERT, and unzip them in the current directory.

Then, you can do

# NQ
./run.sh nq first-only
./run.sh nq mml
./run.sh nq hard-em 8000
# TriviaQA
./run.sh triviaqa first-only
./run.sh triviaqa mml
./run.sh triviaqa hard-em 4000

Details about data

Here we release preprocessed data and source for our experiments on two open-domain QA datasets, NaturalQuestions-open (Kwiatkowski et al 2019) and TriviaQA-open (Joshi et al 2017).

For both datasets, we treat the dev set as the test set, and split the train set into 90/10 for training and development, following conventions that were also used in Chen et al 2017 and Lee et al 2019. For NaturalQuestions, follwoing Lee et al 2019, we take a subset of questions with short answers up to 5 tokens.

You can download this data from here. Each datapoint contains

For preprocessing, we retrieve paragraphs for each question through TF-IDF (for document retrieval; using DrQA from Chen et al 2017) and BM25 (for further paragraph retrieval). We filter train examples where the retriever fails to retrieve any paragraph with the answer text. Preprocessed data with retrieved paragraphs can be downloaded from here.

How to use your own preprocessed data

To use your own data, each line of the data file should be a dictionary (can be decoded by json) containing

Example:

{
  'id': 'user-input-0',
  'question': 'Which city is University of Washington located in?',
  'context': [["The", "University", "of", "Washington", "is", "a", "public", "research", "university", "in", "Seattle", ",", "Washington", ...],
              ["University", "of", "Washington", "has", "been", "affiliated", "with", "many", "notable", "alumni", "and", "faculty", ",", "including", ...]],
  'answers': [[{'text': 'Seattle', 'word_start': 10, 'word_end': 10}, {'text': 'Seattle, Washington', 'word_start': 10, 'word_end': 12}],
              []],
  'final_answers': ["Seattle", "Seattle, Washington"]
}

Details about the model

The model architecture is exactly same as Min et al 2019's model. We only modify loss functions to have different variations. You can check the exact command line for training and evaluating the model in run.sh. Some useful flags are as follows.

Contact

For any question, please contact Sewon Min or post Github issue.