Awesome
ATTEMPT: Attentional Mixture of Prompt Tuning
This includes an original implementation of Akari Asai, Mohammadreza Salehi, Matthew E. Peters, Hannaneh Hajishirzi. "Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts". In Proc. of EMNLP. 2022.
Acknowledgements: We used the huggingface's transformers and dataset libraries. The implementations of the baselines are from the Compacter repository. Huge thanks to the contributors of those amazing repositories!
Content
Installation
please run the command below to install the dependent libraries.
conda create -n attempt_env python=3.8
conda activate attempt_env
python setup.py develop
ATTEMPT
ATTEMPT consists of two-step training: Source Prompt Training and Target Prompt Training.
Training
-
Source Prompt Training: ATTEPT first trains a set of soft prompts on several large-scale dataset, which we call source prompts.
-
Target Prompt Training: For a target task, ATTEMPT newly initializes a target task prompt as well as an attention module G and learns to interpolate the source prompts and the new task prompts using the attention weights generated by G.
Source Prompt Training
python run_seq2seq.py prompt_tuning.json
You can download a set of the prompts by running the command below:
cd attempt
wget https://homes.cs.washington.edu/~akari/models/attempt/source_prompts.zip
unzip source_prompts
rm source_prompts.zip
cd ..
We provides source prompts for different size of T5 models (T5-base, large and 3b). Please see more details in the Trained checkpoints section.
Target Prompt Training (single-task)
Once you obtain the source prompts, you can run target prompt training.
python run_seq2seq.py configs/attempt/sinlge_task.json
Target Prompt Training (multi-task)
To train ATTEMPT on multiple target task simultaneously as discussed in our paper Section 3.3 (Mixed-task Mini-Batch training), you simply need to set multiple tasks for the task_name
parameters (make sure you also set dataset_config_name
; you can just add "en"
for each).
e.g.,
"task_name": ["superglue-boolq", "superglue-cb", "superglue-wic", "superglue-wsc.fixed"],
"dataset_config_name": ["en", "en", "en", "en"],
An example command to conduct multi-task training for SuperGLUE is as follows:
python run_seq2seq.py configs/attempt/multitask_superglue.json
Evaluation
You can run evaluations by running the eval_seq2seq.py script.
- Run trained model on a single target task
python eval_seq2seq.py configs/attempt/eval_single_task.json
- Run trained model on multiple target tasks
python eval_seq2seq.py configs/attempt/eval_suerglue.json
Baselines
As in ATTEMPT, you can configure the parameters in a config.json
file. See the details of the hyper-parameters in config.
The Adapter, Baseline, Prompt Tuning and fine-tuning baseline implementations are mostly from the awesome compacter paper with some minor modifications.
Standard Fine-tuning
A comment to run a standard fine-tuning is shown below.
python run_seq2seq.py configs/baselines/finetuning.json
Prompt tuning
Prompt Tuning (Lester et al., 2021) insert a small embedding (prompt) in front of input to be fed into a frozen LM. During training, only this prompt embedding will be updated.
python run_seq2seq.py configs/baselines/prompt_tuning.json
SPoT
SPoT (Vu et al., 2022) initialize a target task prompt with a pretrained prompt to boost prompt tuning performance. To run the SPoT baseline, you first need to acquire some source prompt using the prompt tuning method.
We also provide a set of trained source prompts. See instructions at the Trained checkpoints section.
python run_seq2seq.py configs/baselines/spot.json
Important config parameters
-
prompt_embedding_path
(a list ofstr
): a list of a prompt embeddings you want to load. -
load_prefix_embeddings
(bool
): set always true for SPoT to initialize your target task prompt with the prompt embedding you passed viaprompt_embedding_path
option. -
save_prefix_only
(bool
): set true if you want to save a prompt embedding only to avoid copying and saving the untouched LMs for every time!
Adapter
Adapter (Houlsby et al., 2019) inserts light-weight layers after transformer layers.
python run_seq2seq.py configs/baselines/adapter.json
Important config parameters
task_reduction_factor
(int
): control how much you reduce the number of parameters in Adapters. Bigger number means less parameters to be updated. By default we settask_reduction_factor
to be 32 as in Mahabadi et al. (2021).
BitFit
BitFit (Zaken et al., 2022) only updates the bias terms of the original LM for each task.
python run_seq2seq.py configs/baselines/bitfit.json
Trained checkpoints
Source prompts
T5-base
To download the trained source prompts for T5-base, please run the command below:
wget https://homes.cs.washington.edu/~akari/models/attempt/attempt_large_source.zip
unzip source_prompts
T5-large
wget https://homes.cs.washington.edu/~akari/models/attempt/source_prompts.zip
unzip source_prompts
T5-3B
wget https://homes.cs.washington.edu/~akari/models/attempt/attempt_3b_source.zip
unzip source_prompts
Pretrained attention weigjts
wget https://homes.cs.washington.edu/~akari/models/attempt/attn_pretrain_nlu.zip
Target task embeddings
The target task embeddings are available at google drive.
For example, you can download and reproduce our paper results by running the following commands.
- SuperGLUE ATTEMPT-mt (
attempt_mt_superglue.zip
)
python eval_seq2seq.py configs/attempt/eval_suerglue.json
- GLUE ATTEMPT-mt (
attempt_mt_glue.zip
)
python eval_seq2seq.py configs/attempt/eval_glue.json
Note: the current eval_seq2seq.py
script assumes all multiple tasks use the same metrics, so for the tasks using different metrics, you need to run evaluation separately. I'll latter add support for multiple metrics support for an easier evaluation pipeline.
Citation and Contact
If you find this repository helpful, please cite our paper.
@inproceedings{asai2022attempt ,
title={Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts},
author={ Asai, Akari and Salehi, Mohammadreza, Peters, Matthew E and Hajishirzi, Hannaneh},
journal={EMNLP},
year={ 2022 }
}
If you have any questions about the paper, feel free to contact Akari Asai (akari[at]cs.washington.edu) or open an issue, and mention @AkariAsai