Home

Awesome

rnn-auxiliary-loss

An Implementation of Learning Longer-term Dependencies in RNNs with Auxiliary Losses (Trinh et al. 2018) in PyTorch.

The paper proposes a simple method to augment RNNs with unsupervised auxiliary losses in order to improve their ability to capture long-term dependencies.

<img src="https://raw.githubusercontent.com/belepi93/rnn-auxiliary-loss/master/pics/overview.png" width="500"> <img src="https://raw.githubusercontent.com/belepi93/rnn-auxiliary-loss/master/pics/r-LSTM.png" width="500"> <img src="https://raw.githubusercontent.com/belepi93/rnn-auxiliary-loss/master/pics/result.png" width="500">

This repo gives you an incomplete implementation of LSTM augmented with reconstruction auxiliary loss(r-LSTM). Since i was not able to find any code available for this paper, i had to improvise many details by myself. I tried to reproduce paper's results but without success. I'm really waiting for any comments or contributions to improve this repo. Thanks!

Requirements

PyTorch 0.4 & Python 3.6
Numpy
Torchvision
TensorboardX

Examples

python main.py --cuda for full training with BPTT 300.

python main.py --cuda --bptt 784 for full training with full BPTT.

python main.py --cuda --single for LSTM.

python main.py --cuda --pre_epochs 0 for skipping pretraining.

python main.py --cuda --dataset MNIST to use MNIST as dataset.

Dataset

You can use MNIST or pMNIST with --dataset MNIST or --dataset pMNIST.

pMNIST is sequential MNIST where each pixel sequence is permuted in the same. It is harder to capture long-term dependencies in pMNIST so the efficacy of using r-LSTM stands out much more when using pMNIST

pMNIST Benchmark Results

ModelsNo EmbFull EmbPart Emb
LSTM Full BP0.90950.84060.8759
LSTM Truncate 3000.90260.8410.873
r-LSTM T3000.90370.87430.8863
r-LSTM Full BP0.91290.88560.8835

I'm not sure how to implement embedding(project input) in a paper so tried to test several methods. In No Emb, I just tried to 1 dimensional input pixel to 128 dimensional dense vector without embedding matrix. In Full Emb, Every pixel(0~255) has its own dense vector in embedding matrix. In Part Emb, Embedding is only applied to auxiliary decoder network.