Home

Awesome

Continual Prompt Tuning for Dialog State Tracking

This is the official code for "Continual Prompt Tuning for Dialog State Tracking" (ACL 2022).

Requirements

Datasets

We conduct experiments on Schema-Guided Dataset. We choose 15 services from the dataset and do most experiments on them. To prepare dataset, download the dataset and put it in data/ path.

If you are interested in the pre-processing, please check utils/preprocess.py and utils/dataloader.py.

Reproduce results in our paper

Our baseline scripts are contained in baseline_scripts and our prompt-tuning scripts are in prompt_scripts.

To reproduce results in Table 1, refer to scripts:

Methodsavg. JGAScript Path
Fine-tuning14.3baseline_scripts/run_t5_vanilla.sh
EWC13.9baseline_scripts/run_t5_ewc.sh
Replay58.6baseline_scripts/run_t5_replay.sh
AdapterCL (20x)49.8baseline_scripts/run_t5_adapter.sh
AdapterCL (1x)30.6baseline_scripts/run_t5_small_adapter.sh
Prompt Tuning48.1prompt_scripts/run_train_pt_randinit.sh
Continual Prompt Tuning59.5prompt_scripts/run_train_cpt_augq.sh
w/ memory60.7prompt_scripts/run_train_cpt_w_mem_augq.sh
w/ memory & backward61.2prompt_scripts/run_train_cpt_w_mem_augq_bwd.sh
Multi-Task Prompt Tuning64.0prompt_scripts/run_multiprompt.sh

To reproduce results in Table 2, refer to scripts:

Techniquesavg. JGAScript Path
-29.6prompt_scripts/run_train_noMSR.sh
CLInit41.8prompt_scripts/run_train_noMSR_cl.sh
MSR48.1prompt_scripts/run_train_pt_randinit.sh
MSR+CLInit57.6prompt_scripts/run_train_cpt.sh
MSR+CLInit+QF59.5prompt_scripts/run_train_cpt_augq.sh
MSR+CLInit+MR60.4prompt_scripts/run_train_cpt_w_mem.sh
MSR+CLInit+QF+MR60.7prompt_scripts/run_train_cpt_w_mem_augq.sh

To reproduce results in Table 3, refer to scripts:

Initializationavg. JGA on 5 seedsScript Path
RandomInit48.1prompt_scripts/run_train_pt_randinit.sh
SelectInit54.5prompt_scripts/run_train_pt_selectinit.sh
CLInit57.6prompt_scripts/run_train_cpt.sh

To reproduce results in Table 4, refer to prompt_scripts/run_train_cpt.sh and set --dataset_order=30/1/31/32.

To reproduce results in Figure 3 and Table 5, refer to prompt_scripts/run_train_cpt_augq.sh and set --model_name_or_path to paths to your t5-base or t5-large pre-trained weights. We use RTX-2080 GPU with 11GB RAM. In our experiments, we choose 4 batch size and 4 accumulation steps for t5-base and 2 batch size, 8 accumulation steps for t5-large. We use fp32 training for both of them.

To reproduce results in Table 6 and Table 7, set the MEM_SIZE accordingly. To make domain's memory size proportional to its training data size, use --no_memory_uniform_dist flag.

Evaluation

For evaluation for non-prompt experiments, refer to gather_res_baseline.py

For evaluation for prompt-tuning experiments, refer to gather_res_prompt.py

For FWT calculation for non-prompt experiments, refer to gather_res_baseline_fwt.py

For FWT calculation for prompt-tuning experiments, refer to gather_res_prompt_fwt.py

Citation

@inproceedings{zhu-etal-2022-cpt4dst,
    title = "Continual Prompt Tuning for Dialog State Tracking",
    author = "Zhu, Qi and Li, Bing and Mi, Fei and Zhu, Xiaoyan and Huang, Minlie",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    publisher = "Association for Computational Linguistics",
}