Awesome

<a href="https://predict-idlab.github.io/tsflex"><img alt="tsflex" src="https://raw.githubusercontent.com/predict-idlab/tsflex/main/docs/_static/logo.png" width="66%"></a>

tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data.

Useful links

Installation

	command
pip	`pip install tsflex`
conda	`conda install -c conda-forge tsflex`

Usage

tsflex is built to be intuitive, so we encourage you to copy-paste this code and toy with some parameters!

<a href="https://predict-idlab.github.io/tsflex/features/#getting-started">Feature extraction</a>

import pandas as pd; import numpy as np; import scipy.stats as ss
from tsflex.features import MultipleFeatureDescriptors, FeatureCollection
from tsflex.utils.data import load_empatica_data

# 1. Load sequence-indexed data (in this case a time-index)
df_tmp, df_acc, df_ibi = load_empatica_data(['tmp', 'acc', 'ibi'])

# 2. Construct your feature extraction configuration
fc = FeatureCollection(
    MultipleFeatureDescriptors(
          functions=[np.min, np.mean, np.std, ss.skew, ss.kurtosis],
          series_names=["TMP", "ACC_x", "ACC_y", "IBI"],
          windows=["15min", "30min"],
          strides="15min",
    )
)

# 3. Extract features
fc.calculate(data=[df_tmp, df_acc, df_ibi], approve_sparsity=True)

Note that the feature extraction is performed on multivariate data with varying sample rates.

signal	columns	sample rate
df_tmp	["TMP"]	4Hz
df_acc	["ACC_x", "ACC_y", "ACC_z" ]	32Hz
df_ibi	["IBI"]	irregularly sampled

<a href="https://predict-idlab.github.io/tsflex/processing/index.html#getting-started">Processing</a>

Working example in our docs

Why tsflex? ✨

Flexible:
- handles multivariate/multimodal time series
- versatile function support => integrates with many packages for:
  - processing (e.g., scipy.signal, statsmodels.tsa)
  - feature extraction (e.g., numpy, scipy.stats, antropy, nolds, seglearn¹, tsfresh¹, tsfel¹)
- feature extraction handles multiple strides & window sizes
Efficient: 
- view-based operations for processing & feature extraction => extremely low memory peak & fast execution time 
 - see: feature extraction benchmark visualization
Intuitive: 
- maintains the sequence-index of the data
- feature extraction constructs interpretable output column names
- intuitive API
Few assumptions about the sequence data:
- no assumptions about sampling rate
- able to deal with multivariate asynchronous data i.e. data with small time-offsets between the modalities
Advanced functionalities:
- apply FeatureCollection.reduce after feature selection for faster inference
- use function execution time logging to discover processing and feature extraction bottlenecks
- embedded SeriesPipeline & FeatureCollection serialization
- time series chunking

¹ These integrations are shown in integration-example notebooks.

Future work 🔨

scikit-learn integration for both processing and feature extraction note: is actively developed upon sklearn integration branch.
Support time series segmentation (exposing under the hood strided-rolling functionality) - see this issue
Support for multi-indexed dataframes

=> Also see the enhancement issues

Contributing 👪

We are thrilled to see your contributions to further enhance tsflex. See this guide for more instructions on how to contribute.

Referencing our package

If you use tsflex in a scientific publication, we would highly appreciate citing us as:

@article{vanderdonckt2021tsflex,
    author = {Van Der Donckt, Jonas and Van Der Donckt, Jeroen and Deprost, Emiel and Van Hoecke, Sofie},
    title = {tsflex: flexible time series processing \& feature extraction},
    journal = {SoftwareX},
    year = {2021},
    url = {https://github.com/predict-idlab/tsflex},
    publisher={Elsevier}
}

Link to the paper: https://www.sciencedirect.com/science/article/pii/S2352711021001904

👤 Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost