

Learning from Irregularly-Sampled Time Series: A Missing Data Perspective

This repository provides a PyTorch implementation of the paper "Learning from Irregularly-Sampled Time Series: A Missing Data Perspective".


This repository requires Python 3.6 or later. The file requirements.txt contains the full list of required Python modules and their version that we tested on. To install requirements:

pip install -r requirements.txt


<img src="examples/images.png" alt="image completion" width="750" />


Under the image directory, the following commands train P-VAE and P-BiGAN for incomplete MNIST:

# P-VAE:
python mnist_pvae.py
# P-BiGAN:
python mnist_pbigan.py


For CelebA, you need to download the dataset from its website. Specifically, you may either:

Under the image directory, the following commands train P-VAE and P-BiGAN for incomplete CelebA:

# P-VAE:
python celeba_pvae.py
# P-BiGAN:
python celeba_pbigan.py

Command-line options

For both MNIST and CelebA scripts, using the option --mask block --block-len n to specify "square observation" missingness with n-by-n observed blocks and --mask indep --obs-prob .2 to specify "independent dropout" missingness with 80% missing pixels.

Use -h to see all the available command-line options for each script (also for the scripts for time series described below).

Time Series

Our implementation takes as input a time series dataset in a format composed of three tensors time, data, mask saved as numpy's npz file. For a time series of N data cases, each of which has C channels with each channel having at most L observations (time-value pairs), it is represented by three tensors time, data and mask of size (N, C, L):

The script gen_toy_data.py is an example of creating a synthetic time series dataset in such format.

Synthetic data

This notebook provides an overview of P-VAE and P-BiGAN and demonstrates how to train them on a synthetic dataset.

<img src="examples/time-series.png" alt="time series imputation" width="600" />

Under the time-series directory, the following commands train a P-VAE and P-BiGAN on a synthetic multivariate time series dataset:

# P-VAE:
python toy_pvae.py
# P-BiGAN:
python toy_pbigan.py


MIMIC-III can be downloaded following the instructions from its website.

For the experiments, we apply the optional preprocessing used in this work to the MIMIC-III dataset.

For time series classification task, our implementation takes as input one of the following three labeled time series data format:

  1. Unsplit format with an additional label vector with the following 4 fields. The data will be randomly split into train/test/validation set.
    • (time|data|mask): numpy array of shape (N, C, L) as described before.
    • label: binary label of shape (N,).
  2. Data come with train/test split with the following 8 fields. The training set will be subsequently split into a smaller training set (80%) and a validation set (20%).
    • (train|test)_(time|data|mask)
    • (train|test)_label
  3. Data come with train/test/validation split with the following 12 fields. This is useful for model selection based on the metric evaluated on the validation set with multiple runs (with different randomness).
    • (train|test|val)_(time|data|mask)
    • (train|test|val)_label

The function split_data in time_series.py demonstrates how the data file is read and split into training/test/validation set. You can follow this to create time series data of your own.

Once the time series data is ready, run the following command under the time-series directory:

# P-VAE:
python mimic3_pvae.py
# P-BiGAN:
python mimic3_pbigan.py


If you find our work relevant to your research, please cite:

  title     = {Learning from Irregularly-Sampled Time Series: A Missing Data Perspective},
  author    = {Li, Steven Cheng-Xian and Marlin, Benjamin M.},
  booktitle = {Proceedings of the 37th International Conference on Machine Learning},
  year      = {2020}


Your feedback would be greatly appreciated! Reach us at li.stevecx@gmail.com.