Home

Awesome

SLT 2018 Special Session - Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems

Schedule: Dec. 18, 2018, 1PM - 5PM, Olympia Plenary Room

  1. 1:00 - 1:10PM Jianfeng Gao: opening<br/>
  2. 1:10 – 1:40PM Gokhan Tur (Uber): "Past, Present, and Future of Conversational AI" (slides)<br/>
  3. 1:40 – 2:10PM Minlie Huang (Tsinghua): "Towards Building More Intelligent Conversational System: Semantics, Consistency & Interactiveness" (slides)<br/>
  4. 2:10 – 2:40PM Vivian Chen (NTU): "Towards Open-Domain Conversational AI" (slides)<br/>
  5. 2:40 – 3:00PM break<br/>
  6. 3:00 – 3:20PM Sungjin Lee (MSR): "MS dialogue challenge: result and outlook" (slides)<br/>
  7. 3:20 – 3:35PM Oral presentation 1 by Sihong Liu - "Universe Model: A Human-like User Simulator Based on Dialogue Context" (slides)<br/>
  8. 3:35 – 3:50PM Oral presentation 2 by Yu-An Wang - "Double dueling Agent for Dialogue Policy Learning" (slides)<br/>
  9. 3:50 – 4:30PM Panel discussion (chaired by Jianfeng Gao): 45mins, Panelist:<br/>
    • Alex Acero (Apple)<br/>
    • Vivian Chen (NTU)<br/>
    • Minlie Huang (Tsinghua)<br/>
    • Sungjin Lee (MSR)<br/>
    • Spyros Matsoukas (Amazon)<br/>
    • Gokhan Tur (Uber)<br/>

News

Task

This special session introduces a Dialogue Challenge for building end-to-end task-completion dialogue systems, with the goal of encouraging the dialogue research community to collaborate and benchmark on standard datasets and unified experimental environment. In this special session, we will release human-annotated conversational data in three domains (movie-ticket booking, restaurant reservation, and taxi booking), as well as an experiment platform with built-in simulators in each domain, for training and evaluation purposes. The final submitted systems will be evaluated both in simulated setting and by human judges.

Please check this description for more details about the task.

Data

In this dialogue challenge, we will release well-annotated datasets for three task-completion domains: movie-ticket booking, restaurant reservation, and taxi ordering. Here shows the statistics of the three datasets.

TaskIntentsSlotsDialogues
Movie-Ticket Booking11292890
Restaurant Reservation11304103
Taxi Ordering11293094

Evaluation

As described in the task description (Section 4), we will evaluate the dialogue systems using both automatic and human evaluations on three criteria.

We will also conduct human evaluation for the competition. We will ask human judges to interact with the final systems submitted by participants. Besides the measurements aforementioned, each user will also give a rating on a scale of 1 to 5 based on the naturalness, coherence, and task-completion capability of the system, at the end of each dialogue session.

Baseline Agents

System Submission Guidelines

Open an account in https://msrprograms.cloudapp.net/MDC2018 and create a submission with an abstract and code in the form of zip file(<100MB), trained agent model, and also NLU and NLG models if applicable. Include instructions for execution as below. Submission can be updated without limit no later than 10/14/2018 11:59 PM PST.

Instructions to run the sample submission in the SubmissionSample folder.

  1. Extract run.zip file (Zip the contents of system/src into run.zip)

  2. Run testrun.py to interact with the agent as below example.

    python testrun.py --agt 0 --usr 1 --max_turn 40 --kb_path ./run/deep_dialog/data_movie/movie.kb.1k.v1.p --goal_file_path ./run/deep_dialog/data_movie/user_goals_first.v2.p --slot_set ./run/deep_dialog/data_movie/slot_set.txt --act_set ./run/deep_dialog/data_movie/dia_acts.txt --dict_path ./run/deep_dialog/data_movie/slot_dict.v1.p --nlg_model_path ./run/deep_dialog/models/nlg/movie/lstm_tanh_[1533529279.91]87_99_199_0.988.p --nlu_model_path ./run/deep_dialog/models/nlu/movie/lstm[1533588045.3]_38_38_240_0.998.p --diaact_nl_pairs ./run/deep_dialog/data_movie/dia_act_nl_pairs.v7.json --intent_err_prob 0.00 --slot_err_prob 0.00 --episodes 500 --act_level 0 --run_mode 0 --cmd_input_mode 0

<!--- ## Timeline |Phase|Dates| | ------ | -------------- | |TBA|TBA| |1. Development Phase|June 1 – Sept 9| |1.1 Code (data extraction code, seq2seq baseline)|June 1| |1.2 "Trial" data made available|June 18| |1.3 Official training data made available| By July 1| |2. Evaluation Phase|Sept 10 – 24| |2.1 Test data made available|Sept 10| -->

Organizers

Reference

If you submit any system to this challenge or publish any other work making use of the resources provided on this project, we ask you to cite the following task description papers:

@article{li2018microsoft,
  title={Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems},
  author={Li, Xiujun and Panda, Sarah and Liu, Jingjing and Gao, Jianfeng},
  journal={arXiv preprint arXiv:1807.11125},
  year={2018}
}

@article{li2016user,
  title={A User Simulator for Task-Completion Dialogues},
  author={Li, Xiujun and Lipton, Zachary C and Dhingra, Bhuwan and Li, Lihong and Gao, Jianfeng and Chen, Yun-Nung},
  journal={arXiv preprint arXiv:1612.05688},
  year={2016}
}

Contact

FQA

  1. How to implement an agent: see here