Home

Awesome

Implementation of DSS-VAE: Generating Sentences from Disentangled Syntactic and Semantic Spaces in ACL-2019.

Environment requirements

Data Preparation

Pre: you may need use a constituency parser ZPar for obtaining the constituency parse tree of a sentence.

There are total THREE steps for preprocessing:

  1. tokenization
python dss_vae/preprocess/my_tokenize.py --raw_file [raw_file_path] --token_file [token_out_path] --for_parse
  1. parsing
Please refer to ZPar, a easy-to-use constituency parser [ZPar](https://sourceforge.net/projects/zpar/files/0.7.5/zpar-0.7.5.tar.gz/download), for obtaining the constituency parse tree of a sentence.
  1. build the dataset
python dss_vae/preprocess/tree_linearization.py --tree_file [tree_file_path] --out_file [tree_out_path] --mode s2b
python dss_vae/structs/generate_dataset.py --train_file [<Sentence,LinearTree> file] --dev_file [<Sentence,LinearTree> file] --test_file [<Sentence,LinearTree> file] --tgt_dir [output_dir] --max_src_vocab 30000 --max_src_len 30 --max_tgt_len 90 --train_size 100000

After Pre-Process, the prepared data directory structure is as follows:

+-- Target Dir
|   +-- train.bin
|   +-- test.bin
|   +-- dev.bin
|   +-- vocab.bin

Training

We can set all the hyper-parametes in the file of config.yaml, and train the model or its variants with the following command:

python main.py --config_files [config.yaml] --mode train_vae --exp_name [exp_name]

Some examples of config.yaml are provided in the directory of CONFIGS.

Citations

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@inproceedings{bao-etal-2019-generating,
    title = "Generating Sentences from Disentangled Syntactic and Semantic Spaces",
    author = "Bao, Yu  and
      Zhou, Hao  and
      Huang, Shujian  and
      Li, Lei  and
      Mou, Lili  and
      Vechtomova, Olga  and
      Dai, Xin-yu  and
      Chen, Jiajun",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1602",
    doi = "10.18653/v1/P19-1602",
    pages = "6008--6019",
}