Home

Awesome

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

This includes an original implementation of "Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?" by Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer.

This code provides:

Please leave issues for any questions about the paper or the code.

If you find our code or paper useful, please cite the paper:

@inproceedings{ min2022rethinking,
    title={ Rethinking the Role of Demonstrations: What makes In-context Learning Work? },
    author={ Min, Sewon and Lyu, Xinxi and Holtzman, Ari and Artetxe, Mikel and Lewis, Mike and Hajishirzi, Hannaneh and Zettlemoyer, Luke },
    booktitle={ EMNLP },
    year={ 2022 }
}

Announcements

Content

  1. Preparation
  2. Reproducing Main Experiments (Section 4.1 of the paper)
  3. Reproducing Ablations (Section 4.2 of the paper)
  4. Reproducing Analysis (Section 5 of the paper)

Preparation

The code is tested with python 3.8.

The data and the code are based on the MetaICL codebase.

git remote add metaicl https://github.com/facebookresearch/MetaICL.git
git pull metaicl main --allow-unrelated-histories -X ours

Install the data dependencies and download the data.

conda conda create --name metaicl-data python=3.8
conda activate metaicl-data
pip install datasets==1.4.0 wget
cd preprocess
python _build_gym.py --build --n_proc=40 --do_test

This uses k=16 by default. If you want to run ablations with varying k, please also run the following.

python _build_gym.py --build --n_proc=40 --do_test --test_k {4|8|32}

After preprocesisng is done, come back to the main directory.

cd ../
conda deactivate

Now, install the model dependencies to run the model. Please note that the Transformer version is not compatible to the datasets library used to download the data, so make sure to use a different environment.

conda conda create --name metaicl python=3.8
conda activate metaicl
pip install torch==1.9.0
pip install git+https://github.com/huggingface/transformers.git@c37573806ab3526dd805c49cbe2489ad4d68a9d7

(Optional) Install OpenAI Python Library for running GPT-3

pip install openai

Reproducing Main Experiments

This is for reproducing experiments in Section 4.1 of the paper. Evaluation datasets are:

No Demonstrations

To run the evaluation of No-Demonstrations:

# Direct GPT-2 Large
python test.py --dataset {dataset} --gpt2 gpt2-large --method direct --out_dir out/gpt2-large --do_zeroshot
# Channel GPT-2 Large
python test.py --dataset {dataset} --gpt2 gpt2-large --method channel --out_dir out/gpt2-large --do_zeroshot
# Direct MetaICL
python test.py --dataset {dataset} --gpt2 metaicl --method direct --out_dir out/metaicl --do_zeroshot
# Channel MetaICL
python test.py --dataset {dataset} --gpt2 channel-metaicl --method channel --out_dir out/channel-metaicl --do_zeroshot
# Direct GPT-J
python test.py --dataset {dataset} --gpt2 gpt-j-6B --method direct --out_dir out/gpt-j --do_zeroshot
# Channel GPT-J
python test.py --dataset {dataset} --gpt2 gpt-j-6B --method channel --out_dir out/gpt-j --do_zeroshot
# GPT-3
python test_gpt3.py --dataset {dataset} --gpt3 {ada|babbage|curie|davinci} --method {direct|channel} --out_dir out/gpt3 --do_zeroshot --api {API key}

Note that test.py and test_gpt3.py does not support multi-gpu for inference.

Other useful flags:

Notes for running GPT-3:

From now on, we will use the above commands as a default and tell you which flags you need to add.

Demonstrations with gold labels

Run the commands same as default commands but add --use_demonstrations --k 16 --seed 100,13,21,42,87.

Demonstrations with random labels

Create the demonstrations with random labels via:

python create_data.py --variant random --dataset {dataset}

Then, run the commands same as default commands but add --use_demonstrations --k 16 --seed 100,13,21,42,87 --dataset {dataset}_random.

Reproducing Ablations

This is for reproducing experiments in Section 4.2 of the paper. Evaluation datasets are:

Number of correct labels

Create the demonstrations with varying number of correct labels via:

python create_data.py --variant {75|50|25|0}_correct --dataset {dataset}

Then, run the commands same as default commands but add --use_demonstrations --k 16 --seed 100,13,21,42,87 --dataset {dataset}_{75|50|25|0}_correct.

Number of input-label pairs in the demonstrations

(Note that you should have run preprocessing with varying k to run this ablation. If you have not done this, please re-visit the Preparation section.)

Create the demonstrations with varying k via:

python create_data.py --variant random --dataset {dataset} --k {4|8|16|32}

Then, run the commands same as default commands but add --use_demonstrations --k {4|8|16|32} --seed 100,13,21,42,87 --dataset {dataset}_random.

Using manual templates

Create the demonstrations with varying type of labels and inference method via:

python create_data.py --variant {gold|random}_w_template --dataset {dataset} --method {direct|channel}

Then, run the commands same as default commands but add --use_demonstrations --k 16 --seed 100,13,21,42,87 --dataset {dataset}_{gold|random}_w_template_{direct|channel}.

Reproducing Analysis

This is for reproducing experiments in Section 5 of the paper. Evaluation datasets are:

Demonstrations with OOD input text

First, you need a corpus file in a .txt format, where each line is a sentence (in the plain text). In the paper, we used samples from the English portion of CC News, which we are unable to release here. Please visit this link to learn more about how to download the CC News corpus.

Create the demonstrations with OOD input text via:

python create_data.py --variant ood_inputs --dataset {dataset} --corpus_path {corpus_path}

Then, run the commands same as default commands but add --use_demonstrations --k 16 --seed 100,13,21,42,87 --dataset {dataset}_ood_inputs.

Demonstrations with random english words

Create the demonstrations with random English words as labels via:

python create_data.py --variant random_english_words --dataset {dataset}

Then, run the commands same as default commands but add --use_demonstrations --k 16 --seed {seed} --dataset {dataset}_random_english_words_seed={seed}, where seed can be one of 100, 13, 21, 42, and 87.

Demonstrations with random labels only (no inputs)

Create the demonstrations with random labels only via:

python create_data.py --variant random_labels_only --dataset {dataset}

Then, run the commands same as default commands but add --use_demonstrations --k 16 --seed 100,13,21,42,87 --dataset {dataset}_random_labels_only.

Demonstrations with no labels (inputs only)

Create the demonstrations with no labels via:

python create_data.py --variant no_labels --dataset {dataset}

Then, run the commands same as default commands but add --use_demonstrations --k 16 --seed 100,13,21,42,87 --dataset {dataset}_no_labels.