Awesome

Overview

Paper: Mourad Khayati, Ines Arous, Zakhar Tymchenko and Philippe Cudré-Mauroux: ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams. PVLDB 2021.
Algorithms: The benchmark evaluates all the algorithms mentioned in the paper: ORBITS, SPIRIT, SAGE, OGDImpute, pcaMME, TKCM and M-RNN*. To enable/disable any algorithm, please refer to the Algorithms customization section below.
Datasets: The benchmark evaluates all the datasets used in the paper: gas (drfit10), motion, bafu and soccer*. To enable/disable any dataset, please refer to the Datasets customization section below.
Scenarios: The benchmark will execute the full set of 11 recovery scenarios and report the error using RMSE, MSE and MAE. A detailed description of the recovery scenarios can be found here.
Reproducibilty: We create a dedicated repo for the reproducibility of all the results reported in this paper.

*disabled by default as it takes a couple of days to run.

Prerequisites | Build | Execution | Benchmark Customization | Citation

Prerequisites

Ubuntu 18 or 20 (including Ubuntu derivatives, e.g., Xubuntu).
Clone this repository.
Mono: Install mono from https://www.mono-project.com/download/stable/ (takes few minutes)

Build

Build the Testing Framework using the installation script located in the root folder (takes few minutes):

    $ sh install_linux.sh

Execution

    $ cd TestingFramework/bin/Debug/
    $ mono TestingFramework.exe

The test suite with the default setup will take ~20 hours to finish.

Results: All results will be added to Results folder. The accuracy results of all algorithms will be sequentially added for each scenario and dataset to: Results/.../.../.../error/. The runtime results of all algorithms will be added to: Results/.../.../.../runtime/. The plots of the recovered blocks will be added to the folder Results/.../.../.../plots/.
Scenarios creation: To compare (externally) your technique against the benchmark results, we provide a command to export the missing scenarios/patterns for a given dataset:

    $ cd TestingFramework/bin/Debug/
    $ mono TestingFramework.exe export dataset_name

This command will produce contaminated data (where missing values are designated as NaN) in the Export/ folder for each streaming scenario in the benchmark.

Benchmark Customization

Algorithms customization

To enable an additional algorithm

open the file TestingFramework/config.cfg
add the name of the algorithm to the line EnabledAlgorithms =

Datasets customization

All the datasets used in this paper can be found in: TestingFramework/bin/Debug/data/
To enable an additional dataset
- open the file TestingFramework/config.cfg
- Add the name of the dataset to the line Datasets =
To add a new dataset to the benchmark
- import the file to TestingFramework/bin/Debug/data/{name}/{name}_normal.txt (name is the name of your data).
- Requirements: rows>= 1'000; columns>= 10; column separator = space; row separator = newline

Scenario customization

To enable an additional recovery scenario

open the file TestingFramework/config.cfg
add the name of the scenario to the line Scenarios =

Citation

@inproceedings{orbits2021vldb,
 author    = {Mourad Khayati and Ines Arous and Zakhar Tymchenko and Philippe Cudr{\'{e}}{-}Mauroux},
 title     = {ORBITS: Online Recovery of Missing Values in Multiple Time Series Streams},
 booktitle = {Proceedings of the VLDB Endowment},
 volume    = {14},
 number    = {3},
 year      = {2021}
}