


This repository contains code and data for the AAAI'2022 paper "GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection".

Full version with Appendix: [PDF]


Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings.

Main Results

GALAXY perform end-to-end dialog modeling and achieve new state-of-the-art results on four TOD benchmark datasets: MultiWOZ2.0, MultiWOZ2.1, In-Car Assistant and CamRest.

End-to-End ModelingInformSuccessBLEUCombined Score
End-to-End ModelingMatchSuccF1BLEUCombined Score
In-Car Assistant85.2683.6023.03107.46

:bangbang: New SOTA results on MultiWOZ (End-to-End Modeling & Policy Optimization) evaluated by the standardized scoring scripts, which are officially recommended for the fair evaluations. We also add new results to the official leaderboard and new predictions to this repository.

MultiWOZInformSuccessBLEUCombined Score
End-to-End Modeling85.4075.7019.64100.2
Policy Optimization92.8083.5019.92108.1


- torch == 1.8.0+cu111
- scikit-learn == 0.23.1
- numpy == 1.18.5
- nltk == 3.5
- spacy == 2.3.5
- scipy == 1.5.0
- regex == 2020.6.8
- tqdm == 4.60.0

We use the tokenization tool in SpaCy and you can directly install python packages by commands: pip install -r requirements.txt and python -m spacy download en_core_web_sm.


Path Definition

Define your own paths <YOUR_PROJECT_PATH> and <YOUR_SAVE_PATH> in scripts as follows:

PROJECT_NAME="GALAXY"  # project name (fixed)
PROJECT_ROOT=<YOUR_PROJECT_PATH>/${PROJECT_NAME}  # root path of this project
SAVE_ROOT=<YOUR_SAVE_PATH>/${PROJECT_NAME}  # root path of model's output

Data Preparation

Download data from this link.

The downloaded zip file data.zip contains pre-training corpora and four TOD benchmark datasets: MultiWOZ2.0, MultiWOZ2.1, In-Car Assistant and CamRest, which have already been processed. You need to put the unzipped directory data into the project directory GALAXY for the subsequent training.


Pre-training Corpora

Pre-trained Checkpoint

You need to unzip the downloaded model file model.zip, then put the unzipped directory model into the project directory GALAXY for the futhuer fine-tuning.


We pre-train the GALAXY on limited labeled dialogs (UniDA) and large-scale unlabeled dialog corpora (UnDial) via semi-supervised learning. You can pre-train GALAXY from scratch by running the following scripts:

# Step 1: Preprocess pre-training corpora
sh scripts/pre_train/preprocess.sh

# Step 2.1: Multi-GPU training on one machine
sh scripts/pre_train/train_single.sh

# Step 2.2: Multi-GPU training across multiple machines (distributed training)
sh scripts/pre_train/train_multi.sh

NOTE: For multi-GPU training, you only need to choose Step 2.1 or Step 2.2. It is worth noting that if you choose Step 2.2, you should have a well-equipped GPU cluster to support such training.


Fine-tuned Checkpoints

Download checkpoints from this link.

The downloaded zip file outputs.zip contains our best fine-tuned checkpoints on different datasets:

If you want to reproduce our reported results, you should put the unzipped directory outputs into the directory ${SAVE_ROOT} (set in scripts). Then you can directly run the inference scripts of different datasets for the reproduction, which will be introduced later.


We fine-tune the GALAXY on the four TOD datasets and focus on the end-to-end dialog modeling (E2E) task. You can fine-tune GALAXY from scratch by running the following training scripts:

# Training on MultiWOZ2.0 (8 GPUs)
sh scripts/multiwoz2.0/train.sh

# Training on MultiWOZ2.1 (8 GPUs)
sh scripts/multiwoz2.1/train.sh

# Training on In-Car Assistant (1 GPU)
sh scripts/kvret/train.sh

# Training on CamRest (1 GPU)
sh scripts/camrest/train.sh

NOTE: For MultiWOZ2.0 and MultiWOZ2.1, we also maintain the DA prediction task to alleviate the model discrepancy between pre-training and fine-tuning. On the other hand, we discard this task on the In-Car Assistant and CamRest due to the lack of useful DAs in these two datasets. Besides, we support both multi-GPU and single-GPU training, you can jointly tune the hyper-parameter ${BATCH_SIZE}$ and ${GRADIENT_ACCUMULATION_STEPS}$ to maintain originally offered batch size when single-GPU training.


After collecting some fine-tuned checkpoints (by directly using ours or fine-tuning GALAXY from scratch by yourself), you can do the inference on the test sets of these datasets by running the following inference scripts:

# Inference on MultiWOZ2.0 (1 GPU)
sh scripts/multiwoz2.0/infer.sh

# Inference on MultiWOZ2.1 (1 GPU)
sh scripts/multiwoz2.1/infer.sh

# Inference on In-Car Assistant (1 GPU)
sh scripts/kvret/infer.sh

# Inference on CamRest (1 GPU)
sh scripts/camrest/infer.sh

NOTE: For reproduction, all the best hyper-parameters have already been set in corresponding scripts and you can follow them to run. If you fine-tune GALAXY from scratch by yourself, the 4-th/60 to 7-th/60 training epochs could offer you the best inference performance on MultiWOZ2.0/2.1.



If you use our code or find GALAXY useful in your work, please cite our paper as:

  title={GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection},
  author={He, Wanwei and Dai, Yinpei and Zheng, Yinhe and Wu, Yuchuan and Cao, Zheng and Liu, Dermot and Jiang, Peng and Yang, Min and Huang, Fei and Si, Luo and others},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},


For personal communication related to GALAXY, please contact Wanwei He (ww.he@siat.ac.cn).