Awesome

Iteratively Prompt Pre-trained Language Models for Chain of Thought

Original implementation of the paper "Iteratively Prompt Pre-trained Language Models for Chain of Thought" in EMNLP-22 by Boshi Wang, Xiang Deng and Huan Sun.

Environment Setup

First have python >= 3.8 installed, e.g.,

conda create -n <YOUR_ENV_NAME> python=3.8

Install dependencies via:

pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
cd transformers
pip install -e .
cd ..
pip install -e .
pip install -r requirements-dev.txt

Repo Tour

.
├── dataset_*/                     # preprocessed datasets
    ├── QA/                        # query (q) -> answer (a), for PLM-QA
    ├── IP/                        # [q; c_1; ...; c_{j-1}] -> c_j, for iterative prompting
    ├── IP_single/                 # q -> [c_1; ...; c_{n_q}], for non-iterative prompting
    ├── RD_Oracle/                 # [q; c_1; ...; c_{n_q}] -> a, for oracle reader
    ├── KE/                        # c_j(masked) -> c_j, for PLM knowledge enhancement
├── job_*/                         # commands for training
├── eval_*/                        # commands for evaluating
├── simpletransformers/seq2seq/    # main implementation of iCAP; with necessary modifications in transformers/
├── ...
├── utils.py                       # helper functions
├── soft_embedding.py              # soft embedding for virtual prompt tokens
└── main.py                        # main script for training/evaluating

Our main code frame borrows from this repo and the soft embedding module is adapted from this implementation.

Usage

Our scripts are run on a cluster with SLURM scheduler. Remember to replace the <...> parts according to your preferences. You can also change the train_batch_size, gradient_accumulation_steps args according to your GPU memory. Use bash instead of sbatch to run on regular servers. The following commands are for 2wiki experiments; the other datasets are similar.

cd job_2wiki

Knowledge Enhancement

sbatch KE-train.sh

Or alternatively, download the trained model checkpoints for 2wiki, lot, r4c.

Training

sbatch ${METHOD}-train.sh

where METHOD is:

iCAP: proposed iterative context-aware prompter
iCAP_stopper: iCAP with stopper module
PromptT: Prompt-Tuning
PromptT_iter: Prompt-Tuning (iter)
PLMFT: PLM fine-tuning
PLMFT_iter: PLM fine-tuning (iter)
PLMQA: fine-tuning PLM on (Q,A) directly
RD_Oracle: Oracle_Reader

We used this implementation for Prefix-Tuning.

Evaluation

cd eval_2wiki

Intrinsic Evaluation

run python gen_eval_script_${METHOD}.py to generate the scripts for running predictions and evaluation. Then run

bash run_pred_all_${SAVE_PATH}.sh

to get predictions, and

bash eval_on_{valid/test}_all_${SAVE_PATH}.sh

to evaluate the predictions.

Extrinsic Evaluation

First prepare a dataset using the predicted contexts; this could be done using the script prep_reader.py by, e.g.,

python prep_reader.py --path dataset_2wiki_0.1/iCAP_RD/ --train <prediction file on train> --valid <prediction file on valid> --test <prediction file on test>

Then fine-tune the trained oracle reader on this dataset, and the results could be evaluated by setting --eval_type qa in eval_qa.py.

Citation

@inproceedings{wang2022iterative,
    title={Iteratively Prompt Pre-trained Language Models for Chain of Thought},
    author={Wang, Boshi and Deng, Xiang and Sun, Huan},
    booktitle={EMNLP},
    year={2022}
}