Awesome
<p align="center"> <a href="https://predict-idlab.github.io/tsflex"><img alt="tsflex" src="https://raw.githubusercontent.com/predict-idlab/tsflex/main/docs/_static/logo.png" width="66%"></a></p>
<!-- ![Downloads](https://img.shields.io/conda/dn/conda-forge/tsflex?logo=anaconda) -->tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data.
Useful links
Installation
command | |
---|---|
pip | pip install tsflex |
conda | conda install -c conda-forge tsflex |
Usage
tsflex is built to be intuitive, so we encourage you to copy-paste this code and toy with some parameters!
<a href="https://predict-idlab.github.io/tsflex/features/#getting-started">Feature extraction</a>
import pandas as pd; import numpy as np; import scipy.stats as ss
from tsflex.features import MultipleFeatureDescriptors, FeatureCollection
from tsflex.utils.data import load_empatica_data
# 1. Load sequence-indexed data (in this case a time-index)
df_tmp, df_acc, df_ibi = load_empatica_data(['tmp', 'acc', 'ibi'])
# 2. Construct your feature extraction configuration
fc = FeatureCollection(
MultipleFeatureDescriptors(
functions=[np.min, np.mean, np.std, ss.skew, ss.kurtosis],
series_names=["TMP", "ACC_x", "ACC_y", "IBI"],
windows=["15min", "30min"],
strides="15min",
)
)
# 3. Extract features
fc.calculate(data=[df_tmp, df_acc, df_ibi], approve_sparsity=True)
Note that the feature extraction is performed on multivariate data with varying sample rates.
signal | columns | sample rate |
---|---|---|
df_tmp | ["TMP"] | 4Hz |
df_acc | ["ACC_x", "ACC_y", "ACC_z" ] | 32Hz |
df_ibi | ["IBI"] | irregularly sampled |
<a href="https://predict-idlab.github.io/tsflex/processing/index.html#getting-started">Processing</a>
Why tsflex? ✨
Flexible
:- handles multivariate/multimodal time series
- versatile function support
=> integrates with many packages for:
- processing (e.g., scipy.signal, statsmodels.tsa)
- feature extraction (e.g., numpy, scipy.stats, antropy, nolds, seglearn¹, tsfresh¹, tsfel¹)
- feature extraction handles multiple strides & window sizes
Efficient
:<br>- view-based operations for processing & feature extraction => extremely low memory peak & fast execution time<br>
Intuitive
:<br>- maintains the sequence-index of the data
- feature extraction constructs interpretable output column names
- intuitive API
Few assumptions
about the sequence data:- no assumptions about sampling rate
- able to deal with multivariate asynchronous data<br>i.e. data with small time-offsets between the modalities
Advanced functionalities
:- apply FeatureCollection.reduce after feature selection for faster inference
- use function execution time logging to discover processing and feature extraction bottlenecks
- embedded SeriesPipeline & FeatureCollection serialization
- time series chunking
¹ These integrations are shown in integration-example notebooks.
Future work 🔨
- scikit-learn integration for both processing and feature extraction<br> note: is actively developed upon sklearn integration branch.
- Support time series segmentation (exposing under the hood strided-rolling functionality) - see this issue
- Support for multi-indexed dataframes
=> Also see the enhancement issues
Contributing 👪
We are thrilled to see your contributions to further enhance tsflex
.<br>
See this guide for more instructions on how to contribute.
Referencing our package
If you use tsflex
in a scientific publication, we would highly appreciate citing us as:
@article{vanderdonckt2021tsflex,
author = {Van Der Donckt, Jonas and Van Der Donckt, Jeroen and Deprost, Emiel and Van Hoecke, Sofie},
title = {tsflex: flexible time series processing \& feature extraction},
journal = {SoftwareX},
year = {2021},
url = {https://github.com/predict-idlab/tsflex},
publisher={Elsevier}
}
Link to the paper: https://www.sciencedirect.com/science/article/pii/S2352711021001904
<p align="center"> 👤 <i>Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost</i> </p>