Home

Awesome

ImputeBench: Benchmark of Imputation Techniques in Time Series

ImputeBench implements over 15 advanced imputation techniques for missing blocks in time series. It evaluates their precision and runtime on various real-world time series datasets using different recovery scenarios. Technical details can be found in our PVLDB 2020 paper: <a href = "http://www.vldb.org/pvldb/vol13/p768-khayati.pdf">Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series </a>. The benchmark can be easily extended with new algorithms (C/C++, Python, or Matlab), datasets, and scenarios.

Prerequisites | Build | Execution | Extension | Contributors | Award | Citation


Prerequisites


Build

    $ sh install_linux.sh
<!--- This will install a virtual environment (`bench-env`) under which the packages for this version will be installed. To use algorithms built using this Python version (DeepMVI, MPIN), you need to activate this virtual environment (example provided in next section). -->

Execution

    $ cd TestingFramework/bin/Debug/
    $ mono TestingFramework.exe [arguments]

Arguments

-alg-d-scen
cdrecairqmiss_perc
dynammobafuts_length
grousechlorinets_nbr
roslclimatemiss_disj
softimpdrift10miss_over
svdimpelectricitymcar
svtmeteoblackout
stmvltempall
spiritbafu_red
tenmfdrift10_red
tkcmall
trmf
all
------------------------
New algs
------------------------
ssa
m-rnn
brits
deepmvi
mpin
pristi
iim

Results

All results and plots will be added to the Results folder. The accuracy results of all algorithms will be sequentially added for each scenario and dataset to: Results/.../.../error/. The runtime results of all algorithms will be added to: Results/.../.../runtime/. The plots of the recovered blocks will be added to the folder Results/.../.../recovery/plots/.

Execution examples

  1. Run a single algorithm (cdrec) on a single dataset (drift10) using one scenario (missing percentage)
    $ mono TestingFramework.exe -alg cdrec -d drift10 -scen miss_perc
  1. Run two algorithms (cdrec, spirit) on a single dataset (drift10) using one scenario (missing percentage)
    $ mono TestingFramework.exe -alg cdrec,spirit -d drift10 -scen miss_perc
  1. Run point 2 without runtime results
    $ mono TestingFramework.exe -alg cdrec,spirit -d drift10 -scen miss_perc -nort
  1. Run the whole VLDB'20 benchmark (all algorithms, all datasets, all scenarios, precision and runtime)
    $ mono TestingFramework.exe -alg all -d all -scen all

Warning: Running the whole benchmark takes a sizeable amount of time (up to 4 days, depending on the hardware) and produces up to 15GB of output files with all recovered data and plots unless stopped early.

  1. Create patterns of missing blocks on one complete dataset (airq) using one scenario (missing percentage)
    $ mono TestingFramework.exe -alg mvexport -d airq -scen miss_perc

Note: You must run each scenario separately on one or multiple datasets. Each time you execute one scenario, the Results folder will be overwritten with the new files.

  1. Additional command-line parameters
    $ mono TestingFramework.exe --help

Parametrized execution

    $ mono TestingFramework.exe -algx svdimp 4 -d drift10 -scen ts_nbr
    $ mono TestingFramework.exe -alg stmvl -algx cdrec 4 -d airq -scen ts_nbr

Remark: The command -algx cannot be executed in a group and thus must precede the name of each algorithm.


Executing New Algorithms

    $ sh install_extra.sh
    $ source bench-env/bin/activate
    $ mono TestingFramework.exe [arguments]

Extension


Contributors

Mourad Khayati (mkhayati@exascale.info) and Zakhar Tymchenko (zakhar.tymchenko@unifr.ch).


Award

Imputebench has received the VLDB 2020 Most Reproducible Paper Award.


Citation

@inproceedings{imputebench2020vldb,
 author    = {Mourad Khayati and Alberto Lerner and Zakhar Tymchenko and Philippe Cudr{\'{e}}{-}Mauroux},
 title     = {Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series},
 booktitle = {Proceedings of the VLDB Endowment},
 volume    = {13},
 number    = {5},
 year      = {2020}
}
<!--- ### Optional commands | Argument | Description | Options | Remarks | | -------- | -------- | -------- | -------- | | -nort | Doesn't test runtime of the algorithms | n/a | - | | -noprec | Doesn't test precision of the algorithms | n/a | - | | -novis | Doesn't render plots which show the recovered block | n/a | - | | -out [folder] | Redirects results from default folder to a custom one | [folder] : a folder to store the results | Folder will be created is it doesn't exist. Existing files might be overwritten. | --->