Home

Awesome

<link rel="stylesheet" type="text/css" href="auto-number-title.css" />

Task-Oriented Dialogue Research Progress Survey

Content

<a name="intro"></a>Introduction

This repo is a dataset and methods survey for Task-oriented Dialogue.

We investigated most existing dialogue datasets and summarized their basic information, such as brief, download link and size.

We also included leader boards of popular dataset to present research progress in the task oriented dialogue fields.

A Chinese intro & news for this project is available here

Refer to this repo:

@misc{MAML_Pytorch,
  author = {Yutai Hou},
  title = {Task-Oriented Dialogue Research Progress Survey},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AtmaHou/Task-Oriented-Dialogue-Research-Progress-Survey/}},
  commit = {master}
}

<a name="updates"></a> Updates

This section records big updates to ease refer (See ./release_detail.md or click links below):

<a name="call"></a> Call for Contributions

Contributions are welcomed, you are encouraged to:

<a name="leader"></a> Leader Boards

The ranking is depended on published results of related papers. We are trying to keep it up-to-date. The ranking may be unfair because features used and train/dev set splitting in those papers may be different. However, it shows a trend of research, and would be helpful for someone to start a project about task-oriented dialogue.

Dialogue State Tracking

Dialogue state tacking task aims to predict or give representation of dialogue state, which usually contains a goal constraint, a set of requested slots, and the user's dialogue act.

MultiWOZ 2.0 - Dialogue State Tracking

Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora.

The new, corrected versions of the dataset are available at MultiWOZ 2.1 (2019), MultiWOZ 2.2 (2020).

Notice: Models marked with * are open-vocabulary based models.`

ModelJoint Acc.Slot Acc.Paper / Source
SOM-DST (BERT-large)* (Kim et al, 2020)52.32-Efficient Dialogue State Tracking by Selectively Overwriting Memory
SOM-DST* (Kim et al, 2020)51.72-Efficient Dialogue State Tracking by Selectively Overwriting Memory
SAS (Hu et al, 2020)51.0397.20SAS: Dialogue State Tracking via Slot Attention and Slot Information Sharing
MERET (Huang et al, 2020)50.9197.07Meta-Reinforced Multi-Domain State Generator for Dialogue Systems
NADST* (Le et al, 2020)50.52-Non-Autoregressive Dialog State Tracking
TRADE* (Wu et al, 2019)48.6296.92Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
SUMBT (Lee et al, 2019)46.64996.44SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking
HyST* (Goel et al, 2019)44.24-HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking
Neural Reading (Gao et al, 2019)41.10-Dialog State Tracking: A Neural Reading Comprehension Approach
GLAD (Zhong et al., 2018)35.5795.44Global-Locally Self-Attentive Dialogue State Tracker
MDBT (Ramadan et al., 2018)15.5789.53Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing

DSTC2 - Dialogue State Tracking

Clarification of dataset types:

The main results we list here are obtained from pure DSTC2 dataset (ASR n-best).

However, we don't list other kinds of DSTC2 data source results such as DSTC2-text (It formulates the dialog state tracking as a machine reading problem which read the dialog transcriptions multiple times and answer the questions about each of the slot, for more info please refer to paper) and DSTC-cleaned (It is used by the NBT paper and fixes ASR noise and typo during training and include ASR noise during testing, The cleaned version is available at here),

ModelAreaFoodPriceJointPaper / Source
Liu et al. (2018)90849272Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Neural belief tracker (Mrkšić et al., 2017)90849472Neural Belief Tracker: Data-Driven Dialogue State Tracking
RNN (Henderson et al., 2014)92868669Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised gate

NLU: Slot Filling

Slot filling task aims to recognize key entity within user utterance, such as position and time.

Snips - Slot Filling

ModelF1Paper / Source
Enc-dec (focus) + BERT97.17Code
Stack-Propagation + BERT (Qin et al., 2019)97.0A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding
Joint BERT (Chen et al., 2019)97.0BERT for Joint Intent Classification and Slot Filling
BLSTM-CRF + ELMo word embedding96.92Code
Stack-Propagation (Qin et al., 2019)94.2A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding
ELMo + BLSTM-CRF (Siddhant et al., 2018)93.90Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
Capsule Neural Networks (Zhang et al., 2018)91.8Joint Slot Filling and Intent Detection via Capsule Neural Networks
Slot-Gated (Full Atten.) (Goo et al., 2018)88.8Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
BLSTM-CRF (Siddhant et al., 2018)88.78Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
Slot-Gated (Intent Atten.) (Goo et al., 2018)88.3Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

ATIS - Slot Filling

Notice: The following works have abnormal high-scores, because they are considered to exploit special pre-processing steps: Bi-model-Decoder (Wang et al., 2018), Intent Gating + Self-atten. (Li et al., 2018), Atten.-Based (Liu and Lane, 2016)

ModelF1Paper / Source
Bi-model-Decoder (Wang et al., 2018)96.89A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling
Intent Gating + Self-atten. (Li et al., 2018)96.52A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding
Stack-Propagation + BERT (Qin et al., 2019)96.10A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding
Joint BERT (Chen et al., 2019)96.10BERT for Joint Intent Classification and Slot Filling
Atomic Concept (Su Zhu and Kai Yu, 2018)96.08Concept Transfer Learning for Adaptive Language Understanding
Atten.-Base + Delexicalization (Shin et al., 2018)96.08Slot Filling with Delexicalized Sentence Generation
Atten.-Based (Liu and Lane, 2016)95.98Attention-based recurrent neural network models for joint intent detection and slot fillin
Stack-Propagation (Qin et al., 2019)95.90A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding
Encoder-Decoder-Pointer (Zhai et al., 2017)95.86Neural Models for Sequence Chunking
ELMo + BLSTM-CRF (Siddhant et al., 2018)95.62Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
Capsule Neural Networks (Zhang et al., 2018)95.2Joint Slot Filling and Intent Detection via Capsule Neural Networks
Slot-Gated (Intent Atten.) (Goo et al., 2018)95.2Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Slot-Gated (Full Atten.) (Goo et al., 2018)94.8Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

NLU: Intent Detection

Intent detection task aims to classify user utterance into different domain or intents.

Snips - Intent Detection

ModelAcc.Paper / Source
ELMo + BLSTM-CRF (Siddhant et al., 2018)99.29Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
Enc-dec (focus) + ELMo99.14Code
Stack-Propagation + BERT (Qin et al., 2019)99.0A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding
Joint BERT (Chen et al., 2019)98.6BERT for Joint Intent Classification and Slot Filling
Stack-Propagation (Qin et al., 2019)98.0A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding
Capsule Neural Networks (Zhang et al., 2018)97.7Joint Slot Filling and Intent Detection via Capsule Neural Networks
Slot-Gated (Full Atten.) (Goo et al., 2018)97.0Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Slot-Gated (Intent Atten.) (Goo et al., 2018)96.8Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

ATIS - Intent Detection

Notice-1: The following works have abnormal high-scores, because they are considered to exploit special pre-processing steps: Bi-model-Decoder (Wang et al., 2018), Intent Gating + Self-atten. (Li et al., 2018), Atten.-Based (Liu and Lane, 2016), BLSTM (Zhang et al., 2016)

ModelAcc.Paper / Source
BLSTM + BERT99.10Code
Bi-model-Decoder (Wang et al., 2018)98.99A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling
Intent Gating + Self-atten. (Li et al., 2018)98.77A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding
Atten.-Based (Liu and Lane, 2016)98.43Attention-based recurrent neural network models for joint intent detection and slot filling
BLSTM (Zhang et al., 2016)98.10A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding
Joint BERT (Chen et al., 2019)97.9BERT for Joint Intent Classification and Slot Filling
Stack-Propagation + BERT (Qin et al., 2019)97.5A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding
ELMo + BLSTM-CRF (Siddhant et al., 2018)97.42Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
Stack-Propagation (Qin et al., 2019)96.9A Stack-Propagation Framework with Token-level Intent Detection for Spoken Language Understanding
Capsule Neural Networks (Zhang et al., 2018)95.0Joint Slot Filling and Intent Detection via Capsule Neural Networks
Slot-Gated (Intent Atten.) (Goo et al., 2018)94.1Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Slot-Gated (Full Atten.) (Goo et al., 2018)93.6Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

<a name="detail"></a> Dataset Introductions

See the data details Here or in Excel File

Following information is included for each dataset:

Tips: The table below may not be displayed completely, scroll right to see more~

NameIntroductionLinksMulti/Single TurnTask DetailPublic AccessibleSize & StatsIncluded LabelMissing Label
Few-shot Slot Tagging Benchmark1. Dialogue slot tagging dataset for few-shot learning setting<br>2. First few-shot sequence labeling benchmark (Meta-episode style data format)<br>3. Also include 5 NER dataset for few-shot sequence labeling evaluation.Download:https://atmahou.github.io/attachments/ACL2020data.zip<br>Paper: https://arxiv.org/pdf/2006.05702.pdfS7 dialogue task:<br>Weather,play music, search, add to list, book, moive<br>5 NER taskYesFor each task, it contains 100 episodes.<br>Each episode contains a query set (20 samples) and a support set (1-shot & 5-shot)SlotsIntent
Taskmaster-2 (2020)1. Unlike Taskmaster-1, which includes both written "self-dialogs" and spoken two-person dialogs, Taskmaster-2 consists entirely of spoken two-person dialogs.<br>2. Users were led to believe they were interacting with an automated system that “spoke” using text-to-speech (TTS)<br>3. Intents are labeled on slotsDownload: https://github.com/google-research-datasets/Taskmaster/tree/master/TM-2-2020/data<br>Homepage: https://github.com/google-research-datasets/Taskmaster/tree/master/TM-2-2020M7 domains:<br>restaurants, food ordering, movies, hotels, flights, music, sportsYes17,289 dialogs:<br>restaurants (3276)<br>food ordering (1050)<br>movies (3047)<br>hotels (2355)<br>flights (2481)<br>music (1602)<br>sports (3478)NLU(Intent, Slots)
JDDC Corpus 20201. A large-scale Multimodal Chinese E-commerce conversation corpus.<br>2. Human2Human conversationsDownload: https://jddc.jd.com/auth\_environment<br>Homepage: https://jddc.jd.com/descriptionMMultimodal E-commerce conversationYesElectronic: 130k dialogues, 950k utterances, 215k images.<br>Clothing: 116k dialogues, 810k utterances, 200k images.Intents (Only on images),<br>DatabaseNLU(Intent, Slots)
CrossWOZ1. CrossWOZ, the first large-scale Chinese Cross-Domain Wizard-of-Oz taskoriented dataset.<br>2. Encourage natural transition across domains in conversation.<br>3. Provide a user simulator<br>4. Human2HumanDownload: https://github.com/thu-coai/CrossWOZ<br>Paper: https://arxiv.org/pdf/2002.11893.pdfM5 domains, including hotel, restaurant, attraction, metro, and taxi.Yes5,012 dialogues,<br>84,692 turns,<br>16.9 Avg. turns,<br><br>Annotation:<br>72 slots, 7,871 vlaues, 6 intentsUser Goals,<br>State (Intent, Slots),<br>DatabaseAPI calls
JDDC Corpus 20191. A large-scale real scenario Chinese E-commerce conversation corpus.<br>2. Human2Human conversations covers: task-oriented, chitchat and question-answering.<br>3. Large scale: 1 million multi-turn dialogues, 20 million utterances.<br>4. Main task: dialogue generationDownload: http://jddc.jd.com/auth\_environment<br>Paper: https://arxiv.org/pdf/1911.09969.pdfME-commerce conversationYesTotoal: 1 million dialogues, 20 million utterances.<br>Annotation: 289 different intents<br>Challenge1: 300 dialogues, 300 questions;<br>Challenge2: 15 dialogues, 168 questions;<br>Challenge3: 108 dialogues, 500 questions;<br>Intent (Machine Labeled),<br>DatabaseSlot
CAIS1. Dialogue utterances from the Chinese Artificial Intelligence Speakers (CAIS) annotated with slot tags and intent labels.Download: https://github.com/Adaxry/CM-Net<br>Paper: https://www.aclweb.org/anthology/D19-1097.pdfSMost are music related tasks.YesTrain 7995;<br>Dev 994;<br>Test 1012;<br>11 Intents, 75 SlotsIntent<br>Slots
Multimodal Dialogs (MMD) Dataset1. Multimodal conversations in the fashion domain.<br>2. Human-to-human<br>3. Contain annotation of query type (Similar to intent)<br>4. Large size: 150K conversation<br>Download: https://amritasaha1812.github.io/MMD/<br>Paper:<br>https://arxiv.org/abs/1704.00200MShopping AssistantYes150K conversation sessionquestion-type (Intent)<br>State Type (17 type of dialogue state class)Slot
Taskmaster-1 (2019)1. A task-based dataset collect with two different procedures: Wizard of Oz and self conversation.<br>2. Encourage realistic and diversity by giving up restrict speaker with knowledage base.<br>3. Both Human2machine and Human2Human dialoguesDownload: https://g.co/dataset/taskmaster-1<br>Paper: https://arxiv.org/pdf/1909.05358.pdfM6 domains: ordering pizza, creating auto repair appointments, setting up ride service, ordering movie tickets, ordering coffee drinks and making restaurant reservations.YesHuman-Human: 7,708 dialogues, 169,469 utterances<br>Human-Machine: 10,438 dialogues, 132,610 utterancesAPI calls,<br>Argument (Slot)Intent
MetaLWOz1. Dialogue dataset for developing fast adaptation methods for conversation models. (Track in DSTC 8)<br>2. Lots of domains and tasks: 47 domains and 227 tasks.<br>3. Suitable for meta learning.<br>4. Main task: dialogue generationDownload: https://www.microsoft.com/en-us/download/58389<br>Homepage: https://www.microsoft.com/en-us/research/project/metalwoz/M47 domains and 227 tasks.Yes37,884 dialogues, ( >10 turns long),<br>47 domains and 227 tasks.Only utteranceNLU(Intent, Slots)
Minecraft Dialogue corpus1. The goal of this project is to develop systems that can collaborate and communicate with each other to solve tasks in a 3D environment.<br>2. Human2Human<br>3. Main task: given context and 3D scenes, generating response.Download: http://juliahmr.cs.illinois.edu/Minecraft/<br>Paper: https://www.aclweb.org/anthology/P19-1537.pdfM"Architect" instruct the "Builder" to build a 3D structure.Yes509 human-human dialogues;<br>15,926 utterances (train 6,548, dev 2,855, test 2,251 Architect utterances);Golden utterance,<br>game log,<br>screenshotsNLU(Intent, Slots)
E-commerce Dialogue Corpus (EDC)1. Real-world conversations between customers and<br>customer service staff from our E-commerce partners in Taobao<br>2. Main task: response selectionDownload:<br>https://github.com/cooelf/<br>Paper: https://arxiv.org/pdf/1806.09102.pdfMContains 5 types of conversations: commodity consultation, logistics express, recommendation, negotiation and chitchat based on over 20 commodities.YesDialogues 1,020,000,<br>Utterance 7,500,000.Only utteranceNLU(Intent, Slots)
Schema-Guided Dialogue State Tracking(DST8)1. Largest by now & containing over 16k multi-domain conversations spanning 16 domains <br>2. Present a schema-guided paradigm<br>3. Enable zero-shot generalization to new APIsDownload: https://github.com/google-research-datasets/dstc8-schema-guided-dialogue <br>Paper: https://arxiv.org/pdf/1909.05855.pdfM16 domains Alarm, Banks, Buses, Calendar, Events, Flights, Homes, Media, Messaging, Movies, etc.YesOver 16k dialogues average number of turns are 20.44 multi-domain dialogues. 329,964 turns in total.Schema for each service contains:<br>service_name and description,<br>slots,<br>intents_
MultiWOZ 2.01. Proposed by EMNLP 2018 best paper.<br>2. Largest by now & contain multi-domains.<br>3. Human2human<br>4. goal changes are encouragedDownload:<br>http://dialogue.mi.eng.cam.ac.uk/index.php/corpus/<br>Paper:<br>https://arxiv.org/pdf/1810.00278.pdf<br>M7 domains<br>Attraction, Hospital,<br>Police, Hotel, Restaurant, Taxi, Train.YesTotal 10438 dialogues<br>average number of turns are 8.93 and 15.39 for single and multi-domain dialogues respectively.<br>115, 434 turns in total.Belief state<br>User Act(inform, request slots)<br>Agent Act(inform, request slots)NLU(Intent, Slots)
Facebook Multilingual Task Oriented Dataset1. (Faceboook) We release a dataset of around 57k annotated utterances<br>in English (43k), Spanish (8.6k) and Thai (5k) for three task oriented domains … ALARM,<br>REMINDER, and WEATHER.<br>2. For cross-lingual natural language understandingDownload: https://fb.me/multilingual\_task\_oriented\_data<br>Paper: https://arxiv.org/pdf/1810.13327.pdfS3 Domains: Alarm, Reminder, Weather<br><br>3 Languages: English, Spanish, ThaiYesEnglish Train: 30,521<br>English Dev: 4,181<br>English Test: 8,621<br><br>Spanish Train: 3,617<br>Spanish Dev: 1,983<br>Spanish Test: 3,043<br><br>Thai Train: 2,156<br>Thai Dev: 1,235<br>Thai Test: 1,692Slot<br>Intent
Medical DS1. Our dataset is collected from the pediatric department in a Chinese online healthcare community<br>2. Task-oriented Dialogue System for Automatic Diagnosis<br>Download:<br>http://www.sdspeople.fudan.edu.cn/zywei/data/acl2018-mds.zip<br>Paper:<br>http://www.sdspeople.fudan.edu.cn/zywei/paper/liu-acl2018.pdf<br>MAutomatic DiagnosisYes4 Disease<br>67 symptomsSlot<br>Action
Snips1. Collected by Snips for model evaluation.<br>2. For natural language understanding<br>3. Homepage: https://medium.com/snips-ai/benchmarking-natural-language-understanding-systems-google-facebook-microsoft-and-snips-2b8ddcf9fb19Download:<br>https://github.com/snipsco/<br>nlu-benchmark/tree/master/ 2017-06-custom-intent-enginesS7 task:<br>Weather,play music, search, add to list, book, moiveYesTrain:13,084<br>Test:700<br>7 intent 72 slot labelsIntent<br>Slots
MIT Restaurant Corpus1. The MIT Restaurant Corpus is a semantically tagged training and test corpus in BIO format.<br>2. For natural language understandingDownload:<br>https://groups.csail.mit.edu/sls/downloads/restaurant/SRestaurantYesTrain, Dev, Test<br>6,894 766 1,521SlotIntent
MIT Movie Corpus1. The MIT Movie Corpus is a semantically tagged training and test corpus in BIO format. The eng corpus are simple queries, and the trivia10k13 corpus are more complex queries.<br>2. For natural language understandingDownload:<br>https://groups.csail.mit.edu/sls/downloads/movie/SMovieYesTrain, Dev, Test<br>MIT Movie Eng 8,798 977 2,443<br>MIT Movie Trivia 7,035 781 1,953<br>Refer to: Data Augmentation for Spoken Language Understanding via Joint Variational GenerationSlotIntent
ATIS1. The ATIS (Airline Travel Information Systems) dataset (Tur et al., 2010) is widely used in SLU research<br>2. For natural language understandingDownload:<br>1. https://github.com/AtmaHou/Bi-LSTM\_PosTagger/tree/master/data<br>2.https://github.com/yvchen/JointSLU/tree/master/dataSAirline Travel InformationYesTrain: 4478<br>Test: 893<br>120 slot and 21 intentIntent<br>Slots
Microsoft Dialogue Challenge1. Containing human-annotated conversational data in three domains an<br>2. Experiment platform with built-in simulators in each domain, for training and evaluation purposes.Paper:<br>https://arxiv.org/pdf/1807.11125.pdfMMovie-Ticket Booking<br>Restaurant Reservation<br>Taxi OrderingYesTask Intents Slots Dialogues<br>Movie-Ticket Booking 11 29 2890<br>Restaurant Reservation 11 30 4103<br>Taxi Ordering 11 29 3094Intent<br>SlotsDatabase<br>API-call
CamRest676CamRest676 Human2Human dataset contains the following three json files:<br>1. CamRest676.json: the woz dialogue dataset, which contains the conversion from users and wizards, as well as a set of coarse labels for each user turn.<br>2. CamRestDB.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes.<br>3. The ontology file, specific all the values the three informable slots can take.Download:<br>https://www.repository.cam.ac.uk/handle/1810/260970<br>Paper:<br>https://arxiv.org/abs/1604.04562MBooking restaurantYesTotal 676 Dialogues<br>Total 1500 Turns<br>Train:Dev:Test 3:1:1 (Test set not given)Slot<br>User Act(inform, request slots)<br>Agent Act(inform, request slots)Intent<br>API call<br>Database
Human-human goal oriented dataset1. Maluuba reased a travel booking dataset<br>2. Design for new task: frame tracking (allow comparing between history entities)<br>3. Homepage: https://datasets.maluuba.com/Frames<br>4. Human2HumanDownload:<br>https://datasets.maluuba.com/Frames/dl<br>Paper:<br>https://arxiv.org/abs/1706.01690<br>https://1drv.ms/b/s!Aqj1OvgfsHB7dsg42yp2BzDUK6UMTravel BookingYesDialogues 1369<br>Turns 19986<br>Average user satisfaction (from 1-5) 4.58Frame<br>User agenda<br>User Act(inform, request slots)<br>Agent Act(inform, request slots)<br>API Call<br>User's satisfaction<br>Task successful<br>Database<br>Entity referenceIntent<br>
Dialog bAbI tasks data1. Facebook's 6 task-oriented dialogues data set consist of 6 different tasks.<br>2. Dataset for task 1-5 is constucted automaticly from bots' chat(Bot2Bot). And dataset for task 6 is simply reformated dstc2 dataset.<br>3. A Shared database is included.<br>4. This is the only task-oriented dataset among bAbI tasks.<br>5. The goal of it is to evaluate end2end tasks, so there is not intents and slots.<br>Download:<br>https://research.fb.com/downloads/babi/<br>Paper:<br>http://arxiv.org/abs/1605.07683MBook a table at a restaurantYesFor each task, <br>training 1000<br>develop 1000<br>test 1000<br><br>For tasks 1-5,<br>second test set (with suffix -OOV.txt) that contains dialogs including entities not present.<br>API call<br>Full DatabaseSlot<br>Intent<br>User Act<br>Agent Act
Stanford Dialog Dataset1. Standford NLP group's data of car autopilot agent.<br>2. Human2Human<br>3. A quick intro http://m.sohu.com/n/499803391/Download:<br>http://nlp.stanford.edu/projects/kvret/kvret\_dataset\_public.zip<br>Paper:<br>https://arxiv.org/abs/1705.05414<br><br>Mcar autopilot agent: schedule, weather, navigationYesTraining Dialogues 2,425<br>Validation Dialogues 302<br>Test Dialogues 304<br>Avg. # of Utterances Per Dialogue 5.25Dialogue level database<br>User Act(inform, request slots)<br>Agent Act(inform, request slots)API call<br>Intent<br>Slot
Stanford Dialog Dataset LU1. Stanford data labeled by HIT, relabel slot & intent<br>2. Human2Human<br>3. A quick intro http://m.sohu.com/n/499803391/ to stanford data<br>4. Annotation handbook: https://docs.google.com/document/d/1ROARKf8AJNnG2\_nPINe1Xm5Rza7V0jPnQV8io09hcFY/editN/A<br><br>Mcar autopilot agent: schedule, weather, navigationNoTraining Dialogues 2,425<br>Validation Dialogues 302<br>Test Dialogues 304<br>Avg. # of Utterances Per Dialogue 5.25Slot<br>Intent<br>API call<br><br>Need to do sample alignment to get the following:<br>Dialogue level database<br>User Act(inform, request slots)<br>Agent Act(inform, request slots)<br>Agent Reply
DSTC-21. Human2Bot restaurant booking dataset<br>2. For usage refer to: http://camdial.org/~mh521/dstc/downloads/handbook.pdf<br>3. Each dialofue is stored in different folder, which contains log and label.http://camdial.org/~mh521/dstc/MBooking restaurantYesTrain 1612 calls<br>Dev 506 calls<br>Test 1117 dialogsSlot<br>User Act(inform, request slots)<br>Agent Act(inform, request slots)Intent<br>API call<br>Database
DSTC41. Data name as TourSG consists of 35 dialog sessions on touristic information for Singapore collected from Skype calls between three tour guides and 35 tourists<br>2. All the recorded dialogs with the total length of 21 hours have been manually transcribed and annotated with speech act and semantic labels for each turn level.<br>3. Homepage: http://www.colips.org/workshop/dstc4/data.html<br>4. Human2HumanN/AMQuery touristic informationNoTrain 20 dialogs<br>Test 15 dialogsspeech act (User & Agent)<br>semantic labels(Intent? User & Agent)<br>topic for turn (Intent?)N/A
Movie Booking Dataset1. (Microsoft) Raw conversational data collected via Amazon Mechanical Turk, with annotations provided by domain experts.<br>2. Human2Human<br>Download:<br>https://github.com/MiuLab/TC-Bot#data<br>Paper:<br>TC-botMBooking MovieYes280 dialogues<br>turns per dialogue is approximately 11User Act(inform, request slots)<br>Agent Act(inform, request slots)<br>Intent<br>SlotsDatabase<br>API-call
Lingxi1. The data is all single round user input divided into good words. There is more noise.<br>2. Completed part of speech tagging and slot labeling<br>3. Language: ChineseN/ASconversational robot service user logNoUtterance: 5132Slot<br>POSAgent reply<br>Intent<br>API call<br>Database
TOP semantic parsing1. (Facebook) A hierarchical semantic representation for task oriented dialog systems that can model compositional and nested queries. (hierarchical intent and slot)<br>2. For natural language understanding<br>3. Human2botDownload:<br>http://fb.me/semanticparsingdialog<br>Paper:<br>https://arxiv.org/pdf/1810.07942.pdfSNavigation and eventYesTrain 31279 utterances<br>Dev 4462 utterances<br>Test 9042 utterancesHierarchical intents<br>Slots
SwDA1. The Switchboard Dialog Act Corpus (SwDA) extends the Switchboard-1 Telephone Speech Corpus, Release 2, with turn/utterance-level dialog-act tags.<br>2. The tags summarize syntactic, semantic, and pragmatic information about the associated turn. The SwDA project was undertaken at UC Boulder in the late 1990s.Download: http://compprag.christopherpotts.net/swda.html<br>Instruction: https://web.stanford.edu/~jurafsky/ws97/manual.august1.htmlSSwitchboard DialogYesTrain: 197,489 training-set utterances, 1115 conversations<br>Test: 40 conversations<br>Annotation: 42 ClassesActSlot

<a name="acknowledgment"></a>Acknowledgment

Thanks for supports from my adviser Wanxiang Che.

Thanks for public contributions from: Shuai Lin, JiAnge, Su Zhu, seeledu, Tony Lin, Jason Krone, Libo Qin, HariiHe, Jelle Bosscher, .