Home

Awesome

pytorch-bits

Experiments for fun and education. Mostly concerning time-series prediction.

I started my experiments with @osm3000's sequence_generation_pytorch repo and some of that code still subsists in these files.

How to run these experiments

  1. clone/download this repo
  2. pip install -r requirements.txt
  3. python experiment.py [ARGS]

Possible arguments include...

Data generation

The generator produces a tensor of shape (length, batches, 1) containing batches independantly generated series of the required length.

Model generation

The --layers argument takes a simplistic model specification. First you specify the layer type, add a "_", then if the layer type needs a size, you add a number. Then you can follow up with "_k=value" for any keyword arguments. If the keyword contains "_" replace it with "-".

For example: --layers LSTM_50 Dropout_p=.5 CausalConv1d_70_kernel-size=3 specifies a three layer network with 50 LSTM units in the first layer, Dropout with p=.5 as the second layer, and 70 CausalConv1d units with kernel_size=3 in the third layer.

If the output of the last requested layer doesn't match the number of target values (for these experiments the target size is 1) then the script adds a Linear layer to produce the required number of output values.

Layers

All of these recurrent layers keep track of their own hidden state (if needed, the hidden state is accessible via the hidden attribute). They all have methods to reset_hidden() and to detach_hidden().

reset_hidden() should be used before feeding the model the start of a new sequence, and detach_hidden() can be called in-between batches of the same set of sequences in order to truncate backpropagation through time and thus avoid the slowdown of having to backpropagate through to the beginning of the entire sequence.

Moreover they all take input of shape (seq_len, batch_size, features). This allows vectorising any calculations that don't depend on the hidden state.

Planned

Ideas/research

Optimisers

COCOB is great for quick experiments, it has a near optimal learning rate with no hyperparameter tuning, so you can quickly tell which experiments are going nowhere. However I suspect that it relies too heavily on assumptions of convexity of the loss. Other optimisers often get lower loss after many epochs. Adam_HD tunes the learning rate of Adam by backpropagating through the update function. It learns pretty fast too.

Planned

Ideas/research

Activations

Note that PyTorch Tanh, Sigmoid and ELU are already very well optimised when run on cpu. My tests show that my simplistic implementation provides little difference when running on cpu.

Planned

Regularisers

Planned