Home

Awesome

UnifiedQA

You may want to check out:

Update (Feb '22): UnifiedQA-v2

Using the models in PyTorch/HuggingFace

You can very easily load the models with Transformers >=3.1, instead of downloading them manually. The models are listed on this page. Here is a list of model these model names hosted on HuggingFace model hub:

Model NameHuggingface ID (s)
UnifiedQA (T5) - smallallenai/unifiedqa-t5-small
UnifiedQA (T5) - baseallenai/unifiedqa-t5-base
UnifiedQA (T5) - largeallenai/unifiedqa-t5-large
UnifiedQA (T5) - 3Ballenai/unifiedqa-t5-3b
UnifiedQA (T5) - 11Ballenai/unifiedqa-t5-11b
UnifiedQA-v2 (T5) - smallallenai/unifiedqa-v2-t5-small-[ckpt]
UnifiedQA-v2 (T5) - baseallenai/unifiedqa-v2-t5-base-[ckpt]
UnifiedQA-v2 (T5) - largeallenai/unifiedqa-v2-t5-large-[ckpt]
UnifiedQA-v2 (T5) - 3Ballenai/unifiedqa-v2-t5-3b-[ckpt]
UnifiedQA-v2 (T5) - 11Ballenai/unifiedqa-v2-t5-11b-[ckpt]

Where [ckpt] can be either 1251000 or 1363200. The numbers in the paper are reported based on 1251000 checkpoints.

Here is an examples:

from transformers import T5Tokenizer, T5ForConditionalGeneration

model_name = "allenai/unifiedqa-t5-small" # you can specify the model size here
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def run_model(input_string, **generator_args):
    input_ids = tokenizer.encode(input_string, return_tensors="pt")
    res = model.generate(input_ids, **generator_args)
    return tokenizer.batch_decode(res, skip_special_tokens=True)

For instance, here is how you can use it to answer a multiple-choice question:

run_model("which is best conductor? \\n (a) iron (b) feather")

which gives: ['iron']

run_model("scott filled a tray with juice and put it in a freezer. the next day, scott opened the freezer. how did the juice most likely change? \\n (a) it condensed. (b) it evaporated. (c) it became a gas. (d) it became a solid.")

which produces: ['it condensed.'].

Note that you can also pass in the arguments for text generation to the run_model(.) function:

run_model("which is best conductor? \\n (a) iron (b) feather (c) wood (d) plastic",
         temperature=0.9, num_return_sequences=4, num_beams=20)

Feeding data into UnifiedQA

Datasets should be converted into a textin/text-out format.

Here are several examples:

DatasetSQuAD 1.1 (extractive QA)
Encoded InputAt what speed did the turbine operate? \n (Nikola_Tesla) On his 50th birthday in 1906, Tesla demonstrated his 200 horsepower (150 kilowatts) 16,000 rpm bladeless turbine. ...
Encoded Output16,000 rpm
DatasetNarrativeQA (Abstractive QA)
Encoded InputWhat does a drink from narcissus's spring cause the drinker to do? \n Mercury has awakened Echo, who weeps for Narcissus, and states that a drink from Narcissus's spring causes the drinkers to ''Grow dotingly enamored of themselves.'' ...
Encoded Outputfall in love with themselves
DatasetARC-challenge (Multiple-choice QA)
Encoded InputWhat does photosynthesis produce that helps plants grow? \n (A) water (B) oxygen (C) protein (D) sugar
Encoded Outputsugar
DatasetMCTest (Multiple-choice QA)
Encoded InputWho was Billy? \n (A) The skinny kid (B) A teacher (C) A little kid (D) The big kid \n Billy was like a king on the school yard. A king without a queen. He was the biggest kid in our grade, so he made all the rules during recess. ...
Encoded OutputThe big kid
DatasetBoolQ (Yes-no QA)
Encoded InputWas America the first country to have a president? \n (President) The first usage of the word president to denote the highest official in a government was during the Commonwealth of England ...
Encoded Outputno

If you wanna see how this encoding is done on our datasets, check out this script.

The datasets/tasks used in the experiments

While the datasets we used are all public, it could be a bit time-confusing to convert them all into text-to-text format. We're releasing the already-proccessed text-to-text datasets based on the encoding used in this work. Files are included in this Google Cloud bucket. Here is the script we used in order to convert each dataset into text-in-text-out format.

Prediction files

Reach out to DanielK if you want them! :)

Released Model Checkpoints

If you intend to create a QA system, you can use our QA-specialized models for your purpose:

T5 models

Note: In the experiments reported in our paper we always used the checkpoint closest to 100k steps (it usually corresponds to checkpoint 1100500)

You can use these in two ways:

For more details see the T5 repository.

BART models

The BART models are downloaded from this link (3.6G). For detailed instructions on running the code (training/finetuning/testing), please refer to here. The uncased models usually gave us better and more robust results.

v2 T5 models

Note: In the experiments reported in our paper we always used the checkpoint closest to 250k steps.

FAQ

I am not getting the expected results. An common issue with using UnifiedQA is making sure you use the separator (\n) when encoding encoding your inputs. See the earlier section where we delineate how to encode the inputs.

Help! I am getting the following error! See this discussion if you're getting the following error:

ValueError: Configurable 'make_layer_stack' doesn't have a parameter named 'use_universal_transformer'.
  In file "gs://danielk-files/t5-models/union_mixture/11B/operative_config.gin", line 83

How to cite

If you extend or use this work, please cite the relevant papers:

@inproceedings{2020unifiedqa,
    title={UnifiedQA: Crossing Format Boundaries With a Single QA System},
    author={D. Khashabi and S. Min and T. Khot and A. Sabhwaral and O. Tafjord and P. Clark and H. Hajishirzi},
    journal={EMNLP - findings},
    year={2020}
}
@article{khashabi2022unifiedqa,
    title={UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training},
    author={Khashabi, Daniel and Kordi, Yeganeh and Hajishirzi, Hannaneh},
    journal={arXiv preprint arXiv:2202.12359},
    year={2022}
}