Home

Awesome

GAN_RNN_TS

Missing values distribution

Author: Ivan Bongiorni, Data Scientist. LinkedIn.

Convolutional Recurrent Seq2seq GAN for the Imputation of Missing Values in Time Series Data

<a href="url" align="center"><img src="https://github.com/IvanBongiorni/GAN-RNN_Timeseries-imputation/blob/master/utils/imputation_example_00.png" align="center" height="204" width="800" ></a> <a href="url" align="center"><img src="https://github.com/IvanBongiorni/GAN-RNN_Timeseries-imputation/blob/master/utils/imputation_example_02.png" align="center" height="204" width="800" ></a>

Description

The goal of this project is the implementation of multiple configurations of a Recurrent Convolutional Seq2seq neural network for the imputation of time series data. Three implementations are provided:

  1. A Recurrent Convolutional seq2seq model.
  2. A GAN (Generative Adversarial Network) based on the same architecture above, where an Imputer is trained to fool an adversarial Network that tries to distinguish real and fake (imputed) time series.
  3. A partially adversarial model, in which both Loss structures of previous models are combined in one: an Imputer model must reduce true error Loss, while trying to fool a Discriminator at the same time.

Models are Implemented in TensorFlow 2 and trained on the Wikipedia Web Traffic Time Series Forecasting dataset.

<a href="url" align="center"><img src="https://github.com/IvanBongiorni/GAN-RNN_Timeseries-imputation/blob/master/utils/performance_comparison_3models.png" align="center" height="300" width="800" ></a>

<br/>

Files

Pipelines:

Scripts:

Notebooks and explanations:

Folders:

<br/>

Modules required

langdetect==1.0.8
numpy==1.18.3
pandas==1.0.3
scikit-learn==0.22.2.post1
scipy==1.4.1
tensorflow==2.1.0
<br/>

Bibliography

<br/>

Hardware

I trained this model on a fairly powerful machine: a System76 Adder WS laptop with 64 GB of RAM and NVidia RTX 2070 GPU.