Awesome
TimeVAE for Synthetic Timeseries Data Generation
TimeVAE is a model designed for generating synthetic time-series data using a Variational Autoencoder (VAE) architecture with interpretable components like level, trend, and seasonality. This repository includes the implementation of TimeVAE, as well as two baseline models: a dense VAE and a convolutional VAE.
Paper Reference
For a detailed explanation of the methodology, see the paper: TIMEVAE: A VARIATIONAL AUTO-ENCODER FOR MULTIVARIATE TIME SERIES GENERATION.
Project Information
This project implements the Variational Autoencoder architecture with modifications to the decoder to include interpretable components of time-series data: level, trend, and seasonality. Additionally, it provides two other baseline models:
- Dense VAE: A simple VAE with dense layers in the encoder and decoder.
- Convolutional VAE: A VAE with convolutional layers in the encoder and decoder, referred to as the base model in the paper.
See ./src/vae/
for the implementation of these models. Note that vae_base.py
contains an abstract superclass and does not represent TimeVAE-Base.
Project Structure
TimeVAE/
├── data/ # Folder for datasets
├── outputs/ # Folder for model outputs
│ ├── gen_data/ # Folder for generated samples
│ ├── models/ # Folder for model artifacts
│ └── tsne/ # Folder for t-SNE plots
├── src/ # Source code
│ ├── config/ # Configuration files
│ │ └── hyperparameters.yaml # Hyperparameters settings
│ ├── vae/ # VAE models implementation
│ │ ├── timevae.py # Main TimeVAE model
│ │ ├── vae_base.py # Abstract superclass
│ │ ├── vae_conv_model.py # Convolutional VAE model (base model)
│ │ ├── vae_dense_model.py # Dense VAE model
│ │ └── vae_utils.py # utils to create, train, and use VAE models
│ ├── data_utils.py # utils for data loading, splitting and scaling
│ ├── paths.py # path variables for config file, data, models, and outputs
│ ├── vae_pipeline.py # Main pipeline script
│ └── visualize.py # Scripts for visualization, including t-SNE plots
├── LICENSE.md # License information
├── README.md # Readme file
└── requirements.txt # Dependencies
Installation
Create a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
Usage
-
Prepare Data: Save your data as a numpy array with shape
(n_samples, n_timesteps, n_features)
in the./data/
folder in.npz
format. The filename without the extension will be used as the dataset name (e.g.,my_data.npz
will be referred to asmy_data
). Alternatively, use one of the existing datasets provided in the./data/
folder. -
Configure Pipeline:
- Update the dataset name and model type in
./src/vae_pipeline.py
:dataset = "my_data" # Your dataset name model_name = "timeVAE" # Choose between vae_dense, vae_conv, or timeVAE
- Set hyperparameters in
./src/config/hyperparameters.yaml
. Key hyperparameters includelatent_dim
,hidden_layer_sizes
,reconstruction_wt
, andbatch_size
.
- Update the dataset name and model type in
-
Run the Script:
python src/vae_pipeline.py
-
Outputs:
- Trained models are saved in
./outputs/models/<dataset_name>/
. - Generated synthetic data is saved in
./outputs/gen_data/<dataset_name>/
in.npz
format. - t-SNE plots are saved in
./outputs/tsne/<dataset_name>/
in.png
format.
- Trained models are saved in
Hyperparameters
The four key hyperparameters for the VAE models are:
latent_dim
: Number of latent dimensions (default: 8).hidden_layer_sizes
: Number of hidden units or filters (default: [50, 100, 200]).reconstruction_wt
: Weight for the reconstruction loss (default: 3.0).batch_size
: Training batch size (default: 16).
For timeVAE
:
trend_poly
: Degree of polynomial trend component (default: 0).custom_seas
: Custom seasonalities as a list of tuples (default: null).use_residual_conn
: Use residual connection (default: true).
The default settings for the timeVAE model set it to operate as the base model without interpretable components.
Note
The default hyperparameters in the ./src/config/hyperparameters.yaml
file have been identified after extensive testing on numerous datasets and tend to perform well on most datasets. However, you may want to tune these hyperparameters for your specific dataset.
License
This project is licensed under the MIT License. See the LICENSE
file for details.
Contact Information
For any inquiries or collaborations, please contact the lead author at: <lead_author_first_name>.<lead_author_last_name>@gmail.com
.
See the paper for author details.