Awesome

tsdownsample

Extremely fast time series downsampling 📈 for visualization, written in Rust.

Features ✨

Fast: written in rust with PyO3 bindings
- leverages optimized argminmax - which is SIMD accelerated with runtime feature detection
- scales linearly with the number of data points

- multithreaded with Rayon (in Rust) <details> <summary>Why we do not use Python multiprocessing</summary> Citing the <a href="https://pyo3.rs/v0.17.3/parallelism.html">PyO3 docs on parallelism</a>: <blockquote> CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing. </blockquote> In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with <a href="https://github.com/rayon-rs/rayon">Rayon</a>) with little to no overhead. </details>
Efficient: memory efficient
- works on views of the data (no copies)
- no intermediate data structures are created
Flexible: works on any type of data
- supported datatypes are
 - for x: f32, f64, i16, i32, i64, u16, u32, u64, datetime64, timedelta64
 - for y: f16, f32, f64, i8, i16, i32, i64, u8, u16, u32, u64, datetime64, timedelta64, bool
 <details> <summary>!! 🚀 <code>f16</code> <a href="https://github.com/jvdd/argminmax">argminmax</a> is 200-300x faster than numpy</summary> In contrast with all other data types above, <code>f16</code> is *not* hardware supported (i.e., no instructions for f16) by most modern CPUs!! 🐌 Programming languages facilitate support for this datatype by either (i) upcasting to f32 or (ii) using a software implementation. 💡 As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a symmetrical ordinal mapping from <code>f16</code> to <code>i16</code> is sufficient. This mapping allows to use the hardware supported scalar and SIMD <code>i16</code> instructions - while not producing any memory overhead 🎉 More details are described in <a href="https://github.com/jvdd/argminmax/pull/1">argminmax PR #1</a>. </details>
Easy to use: simple & flexible API

Install

pip install tsdownsample

Usage

from tsdownsample import MinMaxLTTBDownsampler
import numpy as np

# Create a time series
y = np.random.randn(10_000_000)
x = np.arange(len(y))

# Downsample to 1000 points (assuming constant sampling rate)
s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)

# Select downsampled data
downsampled_y = y[s_ds]

# Downsample to 1000 points using the (possible irregularly spaced) x-data
s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)

# Select downsampled data
downsampled_x = x[s_ds]
downsampled_y = y[s_ds]

Downsampling algorithms & API

Downsampling API 📑

Each downsampling algorithm is implemented as a class that implements a downsample method. The signature of the downsample method:

downsample([x], y, n_out, **kwargs) -> ndarray[uint64]

Arguments:

x is optional
x and y are both positional arguments
n_out is a mandatory keyword argument that defines the number of output values*
**kwargs are optional keyword arguments (see table below):
- parallel: whether to use multi-threading (default: False)
  ❗ The max number of threads can be configured with the TSDOWNSAMPLE_MAX_THREADS ENV var (e.g. os.environ["TSDOWNSAMPLE_MAX_THREADS"] = "4")
- ...

Returns: a ndarray[uint64] of indices that can be used to index the original data.

*When there are gaps in the time series, fewer than n_out indices may be returned.

Downsampling algorithms 📈

The following downsampling algorithms (classes) are implemented:

Downsampler	Description	`**kwargs`
`MinMaxDownsampler`	selects the min and max value in each bin	`parallel`
`M4Downsampler`	selects the min, max, first and last value in each bin	`parallel`
`LTTBDownsampler`	performs the Largest Triangle Three Buckets algorithm	`parallel`
`MinMaxLTTBDownsampler`	(new two-step algorithm 🎉) first selects `n_out` * `minmax_ratio` min and max values, then further reduces these to `n_out` values using the Largest Triangle Three Buckets algorithm	`parallel`, `minmax_ratio`<sup>*</sup>

*Default value for minmax_ratio is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332

Handling NaNs

This library supports two NaN-policies:

Omit NaNs (NaNs are ignored during downsampling).
Return index of first NaN once there is at least one present in the bin of the considered data.

Omit `NaN`s	Return `NaN`s
`MinMaxDownsampler`	`NaNMinMaxDownsampler`
`M4Downsampler`	`NaNM4Downsampler`
`MinMaxLTTBDownsampler`	`NaNMinMaxLTTBDownsampler`
`LTTBDownsampler`

Note that NaNs are not supported for x-data.

Limitations & assumptions 🚨

Assumes;

x-data is (non-strictly) monotonic increasing (i.e., sorted)
no NaNs in x-data

👤 Jeroen Van Der Donckt