Awesome

Lost in the Middle: How Language Models Use Long Contexts

This repository contains accompanying material for Lost in the Middle: How Language Models Use Long Contexts.

Installation
Multi-Document Question Answering Experiments
Multi-Document Question Answering Data
- Generating new multi-document QA data.
Key-Value Retrieval Experiments
Key-Value Retrieval Data
- Generating new key-value retrieval data
Testing Prompting Templates
References

Installation

Set up a conda environment

conda create -n lost-in-the-middle python=3.9 --yes
conda activate lost-in-the-middle

Install package and requirements

pip install -e .

(optional) set up pre-commit hooks for development.

pre-commit install

Multi-Document Question Answering Experiments

See EXPERIMENTS.md for instructions to run and evaluate models on the multi-document QA task.

Multi-Document Question Answering Data

qa_data/ contains multi-document question answering data for the oracle setting (1 input document, which is exactly the passage that answers the question) and 10-, 20-, and 30-document settings (where 1 input passage answers the question, and the other passages do not contain an NQ-annotated answer).

Each line of this gzipped file is in the following format:

{
  "question": "who got the first nobel prize in physics",
  "answers": [
    "Wilhelm Conrad Röntgen"
  ],
  "ctxs": [
    ...
    {
      "id": <string id, e.g., "71445">,
      "title": <string title of the wikipedia article that this passage comes from>,
      "text": <string content of the passage>,
      "score": <string relevance score, e.g. "1.0510446">,
      "hasanswer": <boolean, whether any of the values in the `answers` key appears in the text>,
      "original_retrieval_index": <int indicating the original retrieval index. for example, a value of 0 indicates that this was the top retrieved document>,
      "isgold": <boolean, true or false indicating if this chunk is the gold answer from NaturalQuestions>
    },
    ...
  ],
  "nq_annotated_gold": {
    "title": <string title of the wikipedia article containing the answer, as annotated in NaturalQuestions>,
    "long_answer": "<string content of the paragraph element containing the answer, as annotated in NaturalQuestions>",
    "chunked_long_answer": "<string content of the paragraph element containing the answer, randomly chunked to approximately 100 words>",
    "short_answers": [
      <string short answers, as annootated in NaturalQuestions>
    ]
  }
}

Generating new multi-document QA data.

First, download Contriever retrieval results for each of the queries:

wget https://nlp.stanford.edu/data/nfliu/lost-in-the-middle/nq-open-contriever-msmarco-retrieved-documents.jsonl.gz

Then, to generate examples with 20 total documents with the relevant documents at positions 0, 4, 9, 14, and 19, run:

for gold_index in 0 4 9 14 19; do
    python -u ./scripts/make_qa_data_from_retrieval_results.py \
        --input-path nq-open-contriever-msmarco-retrieved-documents.jsonl.gz \
        --num-total-documents 20 \
        --gold-index ${gold_index} \
        --output-path qa_data/nq-open-20_total_documents_gold_at_${gold_index}.jsonl.gz
done

Key-Value Retrieval Experiments

See EXPERIMENTS.md for instructions to run and evaluate models on the key-value retrieval task.

Key-Value Retrieval Data

kv_retrieval_data/ contains multi-document question answering data for the oracle setting (1 input document, which is exactly the passage the answers the question) and 10-, 20-, and 30-document settings (where 1 input passage answers the question, and the other passages do not contain an NQ-annotated answer).

Each line of this gzipped file is in the following format:

{
  "ordered_kv_records": [
    ...
    [
      "adde4211-888b-48e3-9dbe-66c66551b05f",
      "8bc3e90d-e044-4923-9a5d-7bf48917ed39"
    ],
    [
      "2a8d601d-1d69-4e64-9f90-8ad825a74195",
      "bb3ba2a5-7de8-434b-a86e-a88bb9fa7289"
    ],
    [
      "006b46ef-03fd-4380-938c-49cb03754370",
      "9faeacbe-1d0e-40da-a5db-df598a104880"
    ],
    ...
  ],
  "key": "2a8d601d-1d69-4e64-9f90-8ad825a74195",
  "value": "bb3ba2a5-7de8-434b-a86e-a88bb9fa7289"
}

The ordered_kv_records is a list of [key, value] pairs. The key specifies a particular key to retrieve from ordered_kv_records, and the value lists its expected associated value.

Generating new key-value retrieval data

To generate new key-value retrieval data, use:

python -u ./scripts/make_kv_retrieval_data.py \
    --num-keys 300 \
    --num-examples 500 \
    --output-path kv-retrieval_data/kv-retrieval-300_keys.jsonl.gz

Testing Prompting Templates

Code for converting the examples into string prompts is in src/lost_in_the_middle/prompting.py. After following the installation instructions above, you can run tests with:

$ py.test tests/
========================================= test session starts =========================================
platform linux -- Python 3.9.17, pytest-7.4.0, pluggy-1.2.0
rootdir: /home/nfliu/git/lost-in-the-middle
collected 7 items

tests/test_prompting.py .......                                                                 [100%]

========================================== 7 passed in 0.08s ==========================================

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@misc{liu-etal:2023:arxiv,
  author    = {Nelson F. Liu  and  Kevin Lin  and  John Hewitt  and Ashwin Paranjape  and Michele Bevilacqua  and  Fabio Petroni  and  Percy Liang},
  title     = {Lost in the Middle: How Language Models Use Long Contexts},
  note      = {arXiv:2307.03172},
  year      = {2023}
}