Home

Awesome

Timeseries clustering

Timeseries clustering is an unsupervised learning task aimed to partition unlabeled timeseries objects into homogenous groups/clusters. Timeseries in the same cluster are more similar to each other than timeseries in other clusters

This algorithm is able to:

In general, timeseries clustering algorithms are of two types:

  1. Feature based - transform raw data using feature extraction, run clustering on top of generated features
  2. Raw-data based - directly applied over timeseries vectors without any space-transformations

Variational Recurrent Auto-encoders (VRAE)

VRAE is a feature-based timeseries clustering algorithm, since raw-data based approach suffers from curse of dimensionality and is sensitive to noisy input data. The middle bottleneck layer will serve as the feature representation for the entire input timeseries.

Architecture

<img src='https://raw.githubusercontent.com/tejaslodaya/timeseries-clustering-vae/master/images/architecture.png'>

The network

From here on, RNN refers to Recurrent Neural Network architecture, either LSTM/GRU block. Our model comprises mainly of four blocks

Loss function

The network is jointly trained on 2 loss functions:

  1. KL-divergence between the distribution learned in latent space with the normal distribution.
  2. An auto-encoder learns the identity function, so the sequence of input and output vectors must be similar. In our case, MSE loss is calculated between x_decoded and x forming the reconstruction error. It signifies the extent to which input is reconstructed. Another option : SmoothL1Loss
<img src='https://raw.githubusercontent.com/tejaslodaya/timeseries-clustering-vae/master/images/math.png' >

Optimizer

Given the gradients and the weights, Adam is used to update the weights. Option provided to use Stochastic Gradient Descent(SGD) for optimization.

Why recurrent neural network in an auto-encoder?

Exploratory data analysis is performed on the latent vectors in order to visualize the clusters formed. Since the dimension of z can be more than 2, it has to be converted to a lower dimension to be visualized. PCA and t-SNE are performed to visualize the clusters.

Example - ECG data

NOTE: Please be wary of using this dataset. I had developed this library for an internal client project and had to obfuscate data using an opensource dataset. You may find some errors in the way I'm using this data. Checkout the FAQs below if you have doubts on the implementation.

The above network is trained on a dataset of 8500 ECG's and tested on 950 ECG's Named ECG5000 on the UCR archive, this dataset has 5 classes, and the labels are used to generate different colors on PCA, tSNE chart.

Here's a snapshot of the data: <img src="https://raw.githubusercontent.com/tejaslodaya/timeseries-clustering-vae/master/images/data_examples.png">

Results

When the VRAE model is run on this set of hyperparameters, we get the plots for both PCA and tSNE. PCA and t-SNE are performed in order to convert to a lower dimension and to visualize the clusters. This figure shows that latent space exhibits structure. The pink and green labels obviously cluster in different parts of the space.

<img src = 'https://raw.githubusercontent.com/tejaslodaya/timeseries-clustering-vae/master/images/pca.png'> <img src = 'https://raw.githubusercontent.com/tejaslodaya/timeseries-clustering-vae/master/images/tsne.png'>

FYI, the entire algorithm is an unsupervised one. Labels are just used to color and visually test the results. If you don't have labels for your application, you can run k-means on top of latent vectors to get labels and use those labels to color individual data points.

FAQs

  1. Does VRAE support multivariate data-sets? <br> Yes. It does. There's a parameter called number_of_features. In the case of multi-variate timeseries, pass the degree of freedom here. In case of univariate, please pass 1. Refer to this commit - e7b57a6<br><br>
  2. Does VRAE support unlabelled timeseries data-sets? <br> Auto-encoders are designed to work on unlabelled data, but you would need a subset of labelled data to justify the generated embeddings. This is similar to the case where you have word vectors, but you can justify those embeddings semantically by seeing if "king - queen", etc.. There has to be a way to know if VRAE is working properly (train set has to have labelled data for this). That said, you don't need labels if you have a way of visualizing the clusters. <br><br>
  3. Does VRAE support classification as well? <br> No. VRAE only converts sparse "time-series" to dense vectors. To generate classification on top of it needs the vectors to be passed to some algorithm (k-means in my example)<br><br>
  4. In the ECG example shown, you show compression from 140 elements to 20 elements. What is an appropriate compression ratio for larger lengths, say 9k?<br> The number of time-series to be fed isn't a problem, but the dimension of each time-series is. To combat this, I propose 3 approaches as described below
  1. Is VRAE a stateful version?<br> Yes. VRAE makes extensive use of RNN(LSTM/GRU) blocks which themselves are stateful in nature. Vanilla neural networks are stateless. Setting required_grad=False makes a variable act like a constant and including required_grad=True lets the network "learn" the variable's value through backprop.

Application Areas

  1. Anomaly detection
  2. Data reduction
  3. Determining products with similar selling patterns
  4. Product cannibalization
  5. New product introduction
  6. Low selling items
  7. Items that are introduced and phased out quite frequently (can't estimate seasonality, data < 1yr) <img src = "https://raw.githubusercontent.com/tejaslodaya/timeseries-clustering-vae/master/images/less_data.png" height="50%" width="50%">

Solution: Forecast items in groups (borrowed from here)

  1. Even though each item has a short/sparse life cycle, clustered group has enough data
  2. Modeling the group as a whole, is more robust to outliers and missing data.

Conclusion

We present variational recurrent auto-encoder that learns the structure in the timeseries. Training is unsupervised. When we color the latent vectors with the actual labels, we show that the structure makes sense.

Requirements

Repo works with:

References