Awesome
tsdownsample
<!-- TODO: codecov -->Extremely fast time series downsampling š for visualization, written in Rust.
Features āØ
- Fast: written in rust with PyO3 bindings
- leverages optimized argminmax - which is SIMD accelerated with runtime feature detection
- scales linearly with the number of data points
- multithreaded with Rayon (in Rust) <details> <summary><i>Why we do not use Python multiprocessing</i></summary> Citing the <a href="https://pyo3.rs/v0.17.3/parallelism.html">PyO3 docs on parallelism</a>:<br> <blockquote> CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing. </blockquote> In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with <a href="https://github.com/rayon-rs/rayon">Rayon</a>) with little to no overhead. </details>
- Efficient: memory efficient
- works on views of the data (no copies)
- no intermediate data structures are created
- Flexible: works on any type of data
- supported datatypes are
- for
x
:f32
,f64
,i16
,i32
,i64
,u16
,u32
,u64
,datetime64
,timedelta64
- for
y
:f16
,f32
,f64
,i8
,i16
,i32
,i64
,u8
,u16
,u32
,u64
,datetime64
,timedelta64
,bool
- for
- supported datatypes are
- Easy to use: simple & flexible API
Install
pip install tsdownsample
Usage
from tsdownsample import MinMaxLTTBDownsampler
import numpy as np
# Create a time series
y = np.random.randn(10_000_000)
x = np.arange(len(y))
# Downsample to 1000 points (assuming constant sampling rate)
s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)
# Select downsampled data
downsampled_y = y[s_ds]
# Downsample to 1000 points using the (possible irregularly spaced) x-data
s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)
# Select downsampled data
downsampled_x = x[s_ds]
downsampled_y = y[s_ds]
Downsampling algorithms & API
Downsampling API š
Each downsampling algorithm is implemented as a class that implements a downsample
method.
The signature of the downsample
method:
downsample([x], y, n_out, **kwargs) -> ndarray[uint64]
Arguments:
x
is optionalx
andy
are both positional argumentsn_out
is a mandatory keyword argument that defines the number of output values<sup>*</sup>**kwargs
are optional keyword arguments (see table below):parallel
: whether to use multi-threading (default:False
)
ā The max number of threads can be configured with theTSDOWNSAMPLE_MAX_THREADS
ENV var (e.g.os.environ["TSDOWNSAMPLE_MAX_THREADS"] = "4"
)- ...
Returns: a ndarray[uint64]
of indices that can be used to index the original data.
<sup>*</sup><i>When there are gaps in the time series, fewer than n_out
indices may be returned.</i>
Downsampling algorithms š
The following downsampling algorithms (classes) are implemented:
Downsampler | Description | **kwargs |
---|---|---|
MinMaxDownsampler | selects the min and max value in each bin | parallel |
M4Downsampler | selects the min, max, first and last value in each bin | parallel |
LTTBDownsampler | performs the Largest Triangle Three Buckets algorithm | parallel |
MinMaxLTTBDownsampler | (new two-step algorithm š) first selects n_out * minmax_ratio min and max values, then further reduces these to n_out values using the Largest Triangle Three Buckets algorithm | parallel , minmax_ratio <sup>*</sup> |
<sup>*</sup><i>Default value for minmax_ratio
is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332</i>
Handling NaNs
This library supports two NaN
-policies:
- Omit
NaN
s (NaN
s are ignored during downsampling). - Return index of first
NaN
once there is at least one present in the bin of the considered data.
Omit NaN s | Return NaN s |
---|---|
MinMaxDownsampler | NaNMinMaxDownsampler |
M4Downsampler | NaNM4Downsampler |
MinMaxLTTBDownsampler | NaNMinMaxLTTBDownsampler |
LTTBDownsampler |
Note that NaNs are not supported for
x
-data.
Limitations & assumptions šØ
Assumes;
x
-data is (non-strictly) monotonic increasing (i.e., sorted)- no
NaN
s inx
-data
<p align="center"> š¤ <i>Jeroen Van Der Donckt</i> </p>