Home

Awesome

Normalizing flows

Reimplementations of density estimation algorithms from:

Block Neural Autoregressive Flow

https://arxiv.org/abs/1904.04676

Implementation of BNAF on toy density estimation datasets.

Results

Density estimation of 2d toy data and density estimation of 2d test energy potentials (cf. Figure 2 & 3 in paper):

The models were trained for 20,000 steps with the architectures and hyperparameters described in the Section 5 of the paper, with the exception of rings dataset (bottom right) which had 5 hidden layers. The models trained significantly faster than the planar flow model in Rezende & Mohamed and were much more stable; interestingly, BNAF stretches space differently and requires a lot more test points to show a smooth potential.

Density matching on 2d energy potentialsDensity estimation on 2d toy data
bnaf_u1bnaf_8gaussians
bnaf_u2bnaf_checkerboard
bnaf_u3bnaf_2spirals
bnaf_u4bnaf_rings

Usage

To train model:

python bnaf.py --train
               --dataset      # choice from u1, u2, u3, u4, 8gaussians, checkerboard, 2spirals
               --log_interval # how often to save the model and visualize results
               --n_steps      # number of training steps
               --n_hidden     # number of hidden layers
               --hidden_dim   # dimension of the hidden layer
               --[add'l options]

Additional options are: learning rate, learning rate decay and patience, cuda device id, batch_size.

To plot model:

python bnaf.py --plot
               --restore_file [path to .pt checkpoint]

Useful resources

Glow: Generative Flow with Invertible 1x1 Convolutions

https://arxiv.org/abs/1807.03039

Implementation of Glow on CelebA and MNIST datasets.

Results

I trained two models:

In both cases, gradients were clipped at norm 50, learning rate was 1e-3 with linear warmup from 0 over 2 epochs. Both reached similar results and 4.2 bits/dim.

Samples at varying temperatures

Temperatures ranging 0, 0.25, 0.5, 0.6, 0.7, 0.8, 0.9, 1 (rows, top to bottom):

Model AModel B
model_a_rangemodel_b_range
Samples at temperature 0.7:
Model AModel B
model_a_rangemodel_b_range
Model A attribute manipulation on in-distribution sample:

Embedding vectors were calculated for the first 30K training images and positive / negative attributes were averaged then subtracting. The resulting dz was ranged and applied on a test set image (middle image represents the unchanged / actual data point).

Attributedz range [-2, -1, 0, 1, 2]
Brown hairattr_8
Maleattr_20
Mouth slightly openedattr_21
Youngattr_39
Model A attribute manipulation on 'out-of-distribution' sample (i.e. me):
Attributedz range
Brown hairme_8
Mouth slightly openedme_21

Usage

To train a model using pytorch distributed package:

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE \
       glow.py --train \
               --distributed \
               --dataset=celeba \
               --data_dir=[path to data source] \
               --n_levels=3 \
               --depth=32 \
               --width=512 \
               --batch_size=16 [this is per GPU]

For larger models or image sizes add --checkpoint_grads to checkpoint gradients using pytorch's library. I trained a 3 layer / 32 depth / 512 width model with batch size of 16 without gradient checkpointing and a 4 layer / 48 depth / 512 width model with batch size of 16 which had ~190M params so required gradient checkpointing (and was painfully slow on 8 GPUs).

To evaluate model:

python glow.py --evaluate \
               --restore_file=[path to .pt checkpoint] \
               --dataset=celeba \
               --data_dir=[path to data source] \
               --[options of the saved model: n_levels, depth, width, batch_size]

To generate samples from a trained model:

python glow.py --generate \
               --restore_file=[path to .pt checkpoint] \
               --dataset=celeba \
               --data_dir=[path to data source] \
               --[options of the saved model: n_levels, depth, width, batch_size] \
               --z_std=[temperature parameter; if blank, generates range]

To visualize manipulations on specific image given a trained model:

python glow.py --visualize \
               --restore_file=[path to .pt checkpoint] \
               --dataset=celeba \
               --data_dir=[path to data source] \
               --[options of the saved model: n_levels, depth, width, batch_size] \
               --z_std=[temperature parameter; if blank, uses default] \
               --vis_attrs=[list of indices of attribute to be manipulated, if blank, manipulates every attribute] \
               --vis_alphas=[list of values by which `dz` should be multiplied, defaults [-2,2]] \
               --vis_img=[path to image to manipulate (note: size needs to match dataset); if blank uses example from test dataset]

Datasets

To download CelebA follow the instructions here. A nice script that simplifies downloading and extracting can be found here: https://github.com/nperraud/download-celebA-HQ/

References

Masked Autoregressive Flow

https://arxiv.org/abs/1705.07057

Reimplementation of MADE, MAF, Mixture of Gaussians MADE, Mixture of Gausssians MAF, and RealNVP modules on UCI datasets and MNIST.

Results

Average test log likelihood for un/conditional density estimation (cf. Table 1 & 2 in paper for results and parameters; models here were trained for 50 epochs):

ModelPOWERGASHEPMASSMINIBOONEBSDS300MNIST (uncond)MNIST (cond)
MADE-3.10 +/- 0.022.53 +/- 0.02-21.13 +/- 0.01-15.36 +/- 15.06146.42 +/- 0.14-1393.67 +/- 1.90-1340.98 +/- 1.71
MADE MOG0.37 +/- 0.018.08 +/- 0.02-15.70 +/- 0.02-11.64 +/- 0.44153.56 +/- 0.28-1023.13 +/- 1.69-1013.75 +/- 1.61
RealNVP (5)-0.49 +/- 0.017.01 +/- 0.06-19.96 +/- 0.02-16.88 +/- 0.21148.34 +/- 0.26-1279.76 +/- 9.91-1276.33 +/- 12.21
MAF (5)0.03 +/- 0.016.23 +/- 0.01-17.97 +/- 0.01-11.57 +/- 0.21153.53 +/- 0.27-1272.70 +/- 1.87-1268.24 +/- 2.73
MAF MOG (5)0.09 +/- 0.017.96 +/- 0.02-17.29 +/- 0.02-11.27 +/- 0.41153.35 +/- 0.26-1080.46 +/- 1.53-1070.33 +/- 1.53

Toy density model (cf. Figure 1 in paper):

Target densityLearned density with MADE <br> and random numbers driving MADELearned density with MAF 5 layers <br> and random numbers driving MAF
fig1afig1bfig1c

Class-conditional generated images from MNIST using MAF (5) model; generated data arrange by decreasing log probability (cf. Figure 3 in paper):

mafmnist

Usage

To train model:

python maf.py -- train \
              -- model=['made' | 'mademog' | 'maf' | 'mafmog' | 'realnvp'] \
              -- dataset=['POWER' | 'GAS' | 'HEPMASS' | 'MINIBOONE' | 'BSDS300' | MNIST'] \
              -- n_blocks=[for maf/mafmog and realnvp specify # of MADE-blocks / coupling layers] \
              -- n_components=[if mixture of Gaussians, specify # of components] \
              -- conditional [if MNIST, can train class-conditional log likelihood] \
              -- [add'l options see py file]

To evaluate model:

python maf.py -- evaluate \
              -- restore_file=[path to .pt checkpoint]
              -- [options of the saved model: n_blocks, n_hidden, hidden_size, n_components, conditional]

To generate data from a trained model (for MNIST dataset):

python maf.py -- generate \
              -- restore_file=[path to .pt checkpoint]
              -- dataset='MNIST'
              -- [options of the saved model: n_blocks, n_hidden, hidden_size, n_components, conditional]

Datasets

Datasets and preprocessing code are forked from the MAF authors' implementation here. The unzipped datasets should be symlinked into the ./data folder or the data_dir argument should be specified to point to the actual data.

References

Variational inference with normalizing flows

Implementation of Variational Inference with Normalizing Flows

Results

Density estimation of 2-d test energy potentials (cf. Table 1 & Figure 3 in paper).

Target densityFlow K = 2Flow K = 32Training parameters
uz1uz1k2uz1k32weight init Normal(0,1), base dist. scale 2
uz2uz2k2uz2k32weight init Normal(0,1), base dist. scale 1
uz3uz3k2uz3k32weight init Normal(0,1), base dist. scale 1, weight decay 1e-3
uz4uz4k2uz4k32weight init Normal(0,1), base dist. scale 4, weight decay 1e-3

Usage

To train model:

python planar_flow.py -- train \
                      -- target_potential=[choice from u_z1 | u_z2 | u_z3 | u_z4] \
                      -- flow_length=[# of layers in flow] \
                      -- [add'l options]

Additional options are: base distribution (q0) scale, weight initialization scale, weight decay, learnable first affine layer (I did not find adding an affine layer beneficial).

To evaluate model:

python planar_flow.py -- evaluate \
                      -- restore_file=[path to .pt checkpoint]

Useful resources

Dependencies

Some of the datasets further require: