Home

Awesome

Channel LM Prompting (and beyond)

This includes an original implementation of Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer. "Noisy Channel Language Model Prompting for Few-Shot Text Classification" 2021.

<p align="center"> <img src="img/teaser.png" width="50%" height="50%"> </p>

For any questions about the paper or the code, or to request pretrained checkpoints, please contact the first author (email) or leave issues.

If you find our code or paper useful, please cite the paper:

@article{ min2021noisy ,
  title={ Noisy Channel Language Model Prompting for Few-Shot Text Classification },
  author={ Min, Sewon and Lewis, Mike and Hajishirzi, Hannaneh and Zettlemoyer, Luke },
  journal={ arXiv preprint },
  year={ 2021 }
}

This also includes implementations of many recent papers studying prompt-based learning. Please make sure to cite corresponding papers when you use implementations of the methods in this repo.

Content

  1. Installation
  2. Download & Preprocess Data
  3. Demonstration-based methods
  4. Tuning methods

You can run the channel model and the direct model for each of these methods. Please see Section 3 of the paper for more details about these formulations.

Installation

$ conda create -n lm-prompt python=3.8
$ conda activate lm-prompt
$ conda install pytorch=1.7.1 -c pytorch
$ pip install transformers==4.3.0

Download and Preprocess Data

We use (and modify) the data and the preprocessing script from Gao et al. ACL 2021 (paper, code) and Zhang et al. NeurIPS 2015 (paper, data).

To download the k-shot data (already preprocessed): Download the data (776MB) from this link. Pleae place data.zip under the same directory as the code and unzip it.

To download the original data and preprocess yourself:

pip install pandas==1.1.5 # for preprocessing script
mkdir data
cd data
wget https://nlp.cs.princeton.edu/projects/lm-bff/datasets.tar
tar xvf datasets.tar
cd ..

Also, download TextClassificationDatasets.zip from here (download the entire directory as in the figure below) and place it under data/original.

<p align="center"> <img src="img/data_download.png" width="30%" height="30%"> </p>

Then, run python3 generate_k_shot_data.py, and you are done!

Optionally, you can specify arguments such as

To check the data: You can see the list of eleven datasets used in the paper by ls data/k-shot. Each dataset consists of five different splits based on five different splits (test sets are the same).

Demonstration-based methods

<p align="center"> <img src="img/demonstration.png" width="70%" height="70%"> </p>

This section is for methods which does not update any of the model parameters. For details about methods, please see Section 4.1 of the paper.

Zero-shot

python main.py \
    --task {task_name} \
    --split {dev|test} \
    --data_dir data \
    --out_dir out \
    --gpt2 gpt2-large \
    --do_zeroshot \
    --method {direct|channel}

This command will run zero-shot inference using GPT2-large using four different templates (verbalizers) as reported in the paper.

Useful notes:

Concat-based demonstration

python main.py \
    --task {task_name} \
    --split {dev|test} \
    --data_dir data \
    --out_dir out \
    --gpt2 gpt2-large \
    --do_zeroshot \
    --method {direct|channel} \
    --use_demonstrations \
    --k 16 \
    --seed {13|21|42|87|100}

Ensemble-based demonstration

Add --ensemble to the command line for the Concat-based demonstration method.

Tuning methods

<p align="center"> <img src="img/tuning.png" width="70%" height="70%"> </p>

This section is for methods that fully finetune the model parameters (standard finetuning), or update a very limited number of parameters (prompt tuning, head tuning and transformation tuning). For details about the methods, please see Section 4.2 of the paper.

Prompt tuning

python main.py \
    --task {task_name} \
    --split {dev|test} \
    --data_dir data \
    --out_dir out \
    --gpt2 gpt2-large \
    --method {direct|channel} \
    --prompt_tune \
    --do_train \
    --batch_size 32 \
    --lr {0.1|0.01|0.001}

Head tuning

Use --head_tune instead of --prompt_tune to the command line for the Prompt tuning method. Note that head tuning is only for the direct baseline.

Transformation tuning

Use --transform_tune instead of --prompt_tune to the command line for the Prompt tuning method. Note that transformation tuning is only for the direct baseline.

Standard finetuning

To finetune the entire model parameters, as in typical finetuning, please do not specify any of --prompt_tune, --head_tune or --transform_tune.

Results

For all results, please check out Table 3 and Table 4 of the paper.