Home

Awesome

License CircleCI codecov

T-Digest

Adaptive histogram based on something like streaming k-means crossed with Q-digest.

This implementation is a descendent of Ted MergingDigest, available at: https://github.com/tdunning/t-digest/

And contains the work of Andrew Werner originally available at: https://github.com/ajwerner/tdigestc

Description

The t-Digest construction algorithm uses a variant of 1-dimensional k-means clustering to produce a very compact data structure that allows accurate estimation of quantiles. This t-Digest data structure can be used to estimate quantiles, compute other rank statistics or even to estimate related measures like trimmed means. The advantage of the t-Digest over previous digests for this purpose is that the t-Digest handles data with full floating point resolution. The accuracy of quantile estimates produced by t-Digests can be orders of magnitude more accurate than those produced by previous digest algorithms. Methods are provided to create and update t-Digests and retrieve quantiles from the accumulated distributions.

See the original paper by Ted Dunning & Otmar Ertl for more details on t-Digests.

What’s Inside

The following functions are implemented:

Build notes

# Build
git clone https://github.com/RedisBloom/t-digest-c.git
cd t-digest-c/
git submodule update --init --recursive
make

Testing

Assuming you've followed the previous build steps, it should be as easy as:

# Run the unit tests
make test

Benchmarking

Assuming you've followed the previous build steps, it should be as easy as:

# Run the benchmark
make bench

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.