Home

Awesome

Auto encoder for time series

EDIT 3 December 2018, I receive many questions over email. I compiled the most common questions into a FAQ at the end of this readme

This repo presents a simple auto encoder for time series. It visualizes the embeddings using both PCA and tSNE. I show this on a dataset of 5000 ECG's. The model doesn't use the labels during training. Yet, the produced clusters visually separate the classes of ECG's.

People repeatedly ask me how to find patterns in time series using ML. The usual wavelet transforms and other features fail to yield results. They wonder what ML has to offer.

Why use a Recurrent Neural Network in an auto encoder?

The network

From here on, RNN refers to our Recurrent Neural Network architecture, the Long Short-term memory Our network in AE_ts_model.py has four main blocks

Training Objective

An auto encoder learns the identity function, so the sequence of input and output vectors must be similar. In our case, we take a probabilistic approach. Every output is a tuple of a mean, mu and standard deviation. Let this mu and sigma parametrize a Gaussian distribution. Now we minimize the log-likelihood of the input under this distribution. We train this using backpropagation into the weights of the encoder, decoder and linear layers.

Example data

I showcase the recurrent auto encoder on a dataset of 5000 ECG's. Accurately named ECG5000 on the UCR archive. I choose ECG, because humans understand them easily. Yet, their complexity remains challenging enough for a machine learning model.

Here are some examples, each column represents another input class examples

Results

We run the recurrent auto encoder with a 20D latent space. The following figure plots the latent vectors with both PCA and tSNE. latent_vectors

This figure shows that the latent space exhibits structure. We color the vectors with their corresponding labels. The light blue and dark blue labels obviously cluster in different parts of the space. Interestingly, the lower left corner in the tSNE shows another cluster of orange points. That might be interesting for doctors to look at. (Note that the class distributions are highly unbalanced. The orange and greeen colored data occur less frequently)

Conclusion

We present an auto encoder that learns structure in the time-series. Training is unsupervised. When we color the latent vectors with the actual labels, we show that the structure makes sense.

FAQ

To my great joy, I receive many questions and suggestions over email. I compiled some of the commonly asked questions so you can get started quickly

Please let me know if I forgot your questions in this FAQ section

As always, I am curious to any comments and questions. Reach me at romijndersrob@gmail.com