Home

Awesome

Sequence-to-sequence Data Augmentation for Dialogue Language Understanding

用户输入多样性拓展生成

Author: Atma

Update: 2018/6/7

Introduction

This repo is code for the COLING 2018 paper: Sequence-to-sequence Data Augmentation for Dialogue Language Understanding

Data

Get ATIS in Data dir.

Get full StandfordLU data at link, which contains both slot and intent labels for full data. To use this data, please cite:

@inproceedings{hou2018coling,
	author    = {Yutai Hou and
	Yijia Liu and
	Wanxiang Che and
	Ting Liu},
	title     = {Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding},
	booktitle = {Proc. of COLING},
	pages     = {1234--1245},
	year      = {2018},
}

Get started

The following steps show code usage for the ATIS dataset.

Tips:

To remove clustering effects for baseline setting i.e. cluster all data into one class:
python3 run_clustering.py -cm no_clustering -d atis

Tips:

There are some alternatives for baseline setting:

No clustering, Full connect , no index
python3 run_onmt_generation.py -gd -pm circle -ni -nc

Full connect , no index
python3 run_onmt_generation.py  -gd -pm full_connect -ni

Diverse connect, no index
python3 run_onmt_generation.py  -gd -ni

Diverse connect, no filtering
python3 run_onmt_generation.py  -gd -fr 1

Tips:

Again, alternatives for baseline:

No clustering, Full connect , no index
python3 run_onmt_generation.py -t atis_labeled -f -pm circle -ni -nc

Full connect , no index  ===> running
python3 run_onmt_generation.py -t atis_labeled -f -pm full_connect -ni

Diverse connect, no index
python3 run_onmt_generation.py -t atis_labeled -f -ni

Diverse connect, no filtering
CUDA_VISIBLE_DEVICES="1" python3 run_onmt_generation.py  -t atis_labeled -f -fr 1

Tips:

For surface realization only baseline:
python3 run_thesaurus.py -t atis_labeled -rf

Tips: For surface realization only baseline: python3 run_slot_filling_evaluation.py -t atis_labeled -gd xiaoming -cd -rfo

Notice

As the slot-filling used by our work is simply Bi-LSTM and our augmentation method suit for all slot-filling algorithm, we only release the seq2seq argumentation part and CONLL format data generation part.

You can add your own slot filling algorithm.