Home

Awesome

TGen

A statistical natural language generator for spoken dialogue systems

TGen is a statistical natural language generator, with two different algorithms supported:

  1. A statistical sentence planner based on A*-style search, with a candidate plan generator and a perceptron ranker
  2. A sequence-to-sequence (seq2seq) recurrent neural network architecture based on the TensorFlow toolkit

Both algoritms can be trained from pairs of source meaning representations (dialogue acts) and target sentences. The newer seq2seq approach is preferrable: it yields higher performance in terms of both speed and quality.

Both algorithms support generating sentence plans (deep syntax trees), which are subsequently converted to text using the existing the surface realizer from Treex NLP toolkit. The seq2seq algorithm also supports direct string generation.

For more details on the algorithms, please refer to our papers:

Installation and Usage

Please refer to USAGE.md for instructions on how to use TGen.

Notice

Citing TGen

If you use or refer to the seq2seq generation in TGen, please cite this paper:

If you use or refer to the context-aware improved seq2seq generation, please cite this paper:

If you use or refer morphology-aware generation (designed for Czech), please cite this paper (link coming soon):

If you use or refer to the A*-search generation in TGen, please cite this paper:

License

Author: Ondřej Dušek

Copyright © 2014-2019 Institute of Formal and Applied Linguistics, Charles University, Prague.

Licensed under the Apache License, Version 2.0 (see LICENSE.txt).

Acknowledgements

Work on this project was funded by the Ministry of Education, Youth and Sports of the Czech Republic under the grant agreement LK11221 and core research funding, SVV projects 260 104 and 260 333, and GAUK grant 2058214 of Charles University in Prague, as well as Charles University project PRIMUS/19/SCI/10. It used language resources stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (projects LM201001 and LM2015071).