Awesome

Recurrent Batch Normalization

Chainer implementation of Batch-normalizing LSTM described in Recurrent Batch Normalization [arXiv:1603.09025].

Todo:

separate statistics

The batch normalization transform relies on batch statistics to standardize the LSTM activations. It 
would seem natural to share the statistics that are used for normalization across time, just as recurrent
neural networks share their parameters over time. However, we have found that simply averaging
statistics over time severely degrades performance. Although LSTM activations do converge to a
stationary distribution, we have empirically observed that their statistics during the initial transient
differ significantly as figure 1 shows. Consequently, we recommend using separate statistics for
each timestep to preserve information of the initial transient phase in the activations.

Requirements

Chainer 1.8+

Running

Before

from chainer import links as L
lstm = L.LSTM(n_in, n_out)

After

from bnlstm import BNLSTM
lstm = BNLSTM(n_in, n_out)