Home

Awesome

CrossWOZ

CrossWOZ is the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains rich annotation of dialogue states and dialogue acts at both user and system sides. We also provide a user simulator and several benchmark models for pipelined taskoriented dialogue systems, which will facilitate researchers to compare and evaluate their models on this corpus.

Refer to our paper for more details: CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset (accepted by TACL)

If you have any question, feel free to open an issue.

Annotation Platform

We also release our annotation platform (Sep 10, 2021), supporting two annotators converse synchronously and make annotations online. Please refer to the web directory.

Data

A piece of dialogue: (Names of hotels are replaced by A,B,C for simplicity.)

example

In data/crosswoz directory. Data statistics:

SplitTrainValidTest
# dialogues5,012500500
# Turns (utterances)84,6928,4588,476
Vocab12,5025,2025,143
Avg. sub-goals3.243.263.26
Avg. semantic tuples14.814.915.0
Avg. turns16.916.917.0
Avg. tokens per turn16.316.316.2

According to the type of user goal, we group the dialogues in the training set into five categories:

Statistics for dialogues of different goal types in the training set:

Goal typeSMM+TCMCM+T
# dialogues41715736911759572
NoOffer rate0.100.220.220.610.55
Multi-query rate0.060.070.070.140.12
Goal change rate0.100.280.310.690.63
Avg. dialogue acts1.851.902.092.062.11
Avg. sub-goals1.002.493.623.874.57
Avg. semantic tuples4.511.315.818.220.7
Avg. turns6.813.716.021.021.6
Avg. tokens per turn13.215.216.316.917.0

We also provide database in data/crosswoz/database.

Data format

Code

please install via:

pip install -e .

Code:

Result:

result

Citing

Please kindly cite our paper if this paper and the dataset are helpful.

@article{zhu2020crosswoz,
  author = {Qi Zhu and Kaili Huang and Zheng Zhang and Xiaoyan Zhu and Minlie Huang},
  title = {Cross{WOZ}: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset},
  journal = {Transactions of the Association for Computational Linguistics},
  year = {2020}
}