Home

Awesome

Grammar Variational Autoencoder

This repository contains training and sampling code for the paper: <a href="https://arxiv.org/abs/1703.01925">Grammar Variational Autoencoder</a>.

Requirements

Python 2.7

Install (CPU version) using pip install -r requirements.txt

For GPU compatibility, replace the fourth line in requirements.txt with: https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl

Creating datasets

Molecules

To create the molecule datasets, call:

Equations

The equation dataset can be downloaded here: grammar, string

Training

Molecules

To train the molecule models, call:

Equations

Sampling

Molecules

The file molecule_vae.py can be used to encode and decode SMILES strings. For a demo run:

Equations

The analogous file equation_vae.py can encode and decode equation strings. Run:

Bayesian optimization

The Bayesian optimization experiments use sparse Gaussian processes coded in theano.

We use a modified version of theano with a few add ons, e.g. to compute the log determinant of a positive definite matrix in a numerically stable manner. The modified version of theano can be insalled by going to the folder Theano-master and typing

The experiments with molecules require the rdkit library, which can be installed as described in <a href="http://www.rdkit.org/docs/Install.html">http://www.rdkit.org/docs/Install.html</a>.

The Bayesian optimization experiments can be replicated as follows:

1 - Generate the latent representations of molecules and equations. For this, go to the folders

molecule_optimization/latent_features_and_targets_grammar/

molecule_optimization/latent_features_and_targets_character/

equation_optimization/latent_features_and_targets_grammar/

equation_optimization/latent_features_and_targets_character/

and type

2 - Go to the folders

molecule_optimization/simulation1/grammar/

molecule_optimization/simulation1/character/

equation_optimization/simulation1/grammar/

equation_optimization/simulation1/character/

and type

Repeat this step for all the simulation folders (simulation2,...,simulation10). For speed, it is recommended to do this in a computer cluster in parallel.

2 - Extract the results by going to the folders

molecule_optimization/

equation_optimization/

and typing