Home

Awesome

XOR QA: Cross-lingual Open-Retrieve Question Answering

Tasks | Download | Baselines | Evaluation | Website and Leader board | Paper | Updates

Introduction

XOR-TyDi QA brings together for the first time information-seeking questions, open-retrieval QA, and multilingual QA to create a multilingual open-retrieval QA dataset that enables cross-lingual answer retrieval. It consists of questions written by information-seeking native speakers in 7 typologically diverse languages and answer annotations that are retrieved from multilingual document collections.

The Tasks

There are three sub-tasks: XOR-Retrieve, XOR-EnglishSpan, and XOR-Full.

task_overview

XOR-Retrieve

XOR-Retrieve is a cross-lingual retrieval task where a question is written in the target language (e.g., Japanese) and a system is required to retrieve English document that answers the question.

XOR-EnglishSpan

XOR-English Span is a cross-lingual retrieval task where a question is written in the target language (e.g., Japanese) and a system is required to output a short answer in English.

XOR-Full

XOR-Full is a cross-lingual retrieval task where a question is written in the target language (e.g., Japanese) and a system is required to output a short answer in the target language.

Download the Dataset

You can download the data at the following URLs.

The datasets below include question and short answer information only. If you need the long answer information for supervised training of retrievers or reader, please download the GoldParagraph data.

We also ask you to use Wikipedia 2019-0201 dump, which can be downloaded the link from TyDiQA's source data list for the 7 languages + English.

Note (April 12, 2021): Please note that we modified the XOR-TyDi QA data, and released a new version as XOR-TyDi (v1.1). All of the data you can download are from here is v1.1 and the leaderboard results are based on v1.1.

Data for XOR tasks

For XOR-Retrieve and XOR-English Span:

For XOR-Full:

Additional resources

Gold Paragraph Data

Question translation data

We also make the human annotated 30k question translation data publicly available. As the translation data is only used for annotation and we do not expect use this oracle translation, we release the translation for train data only.

The translation data for each language pair (English-{Arabic, Bengali, Finnish, Japanese, Korean, Russian, Telugu}) is represented as a pair of text file, in which each line include one sentence corresponding to the translated English question, following common MT corpora.

The list of the links to parallel corpora (L_i-to-English) is below:

Building a baseline system

Our baseline includes: Dense Passage Retriever (Karpukhin et al., 2020), Path Retriever (Asai et al., 2020), BM25 (implementations are based on ElasticSearch)+multilingual QA models.

Please see baselines/README.md for more information.

Evaluation

To evaluate your modes' predictions on development data, please run the commands below. Please see the details of the prediction file format and make sure your prediction results follow the format.

You also needs to install MeCab and NLTK before running evaluation -- they are used to tokenization for XOR-Retrieve and for evaluations on Japanese answers.

pip install mecab-python3
pip install unidic-lite
pip install nltk
python3 evals/eval_xor_retrieve.py \
    --data_file <path_to_input_data> \
    --pred_file <path_to_predictions>
python3 evals/eval_xor_engspan.py \
    --data_file <path_to_input_data> \
    --pred_file <path_to_predictions>
python3 evals/eval_xor_full.py \
    --data_file <path_to_input_data> \
    --pred_file <path_to_predictions>

Prediction file format

To evaluate your model's predictions, you need to format the predicted results in specific formats.

XOR-Retrieve

Note: Our evaluation script evaluate if the correct answers are included in the first 2,000 tokens and 5,000 tokens for R@2kt and R@5kt, respectively. Please make sure the total token numbers of your retrieved document would be larger than those tokens; otherwise your scores might be underestimated. See the detailed definition of those metrics in our paper.

The XOR-Retrieve file should be output as follows:

["id": 12345, "lang": "ja, ctxs": ["Tokyo (東京) is the capital and most populous prefecture of Japan.", "Located at the head of Tokyo Bay, the prefecture forms part of the Kantō region on the central Pacific coast of Japan's main island, Honshu. " ... ]
]

XOR-Full, XOR-English Span

For those two tasks, your prediction file should be a json file of a dictionary, whose keys are question ids and values are the predicted short answers.

e.g.,

{"12345": "東京", "67890": "Москва", ...}

Submission guide

If you want to submit to our leaderboard, please create the prediction files on our test data for your target task, and email it to Akari Asai (akari[at]cs.washington.edu).

Please make sure you include the following information in the email.

Notes

Updates

Citation and Contact

If you find this codebase is useful or use the data in your work, please cite our paper.

@inproceedings{xorqa,
    title   = {{XOR} {QA}: Cross-lingual Open-Retrieval Question Answering},
    author  = {Akari Asai and Jungo Kasai and Jonathan H. Clark and Kenton Lee and Eunsol Choi and Hannaneh Hajishirzi},
    booktitle={NAACL-HLT},
    year    = {2021}
}

If you use TyDi-XOR QA data, please also make sure to cite the original TyDi QA paper, which we built TyDI-XOR off of:

@article{tydiqa,
title   = {TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages},
author  = {Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}
journal = {TACL},
year    = {2020}
}

Please contact Akari Asai (@AkariAsai, akari[at]cs.washington.edu) for questions and suggestions.