Awesome
streaming-papers
A curated collection of papers on streaming algorithms
Please Contribute
If you have papers you want to add, make a pull request. Categories are wide open right now, so just put in a folder that makes sense to you and we'll figure it out.
Distinct Value Counting
distinct_value_counting/Probabilistic_Multiplicity_Counting-Lieven2010a.pdf
Known Implementations
- seiflotfy/pmc - Go
- sorenmacbeth/runpmc - Clojure
===
Data Streams as Random Permutations: the Distinct Element Problem - Helmi, Lumbroso, Martinez, Viola
distinct_value_counting/data_streams_as_random_permutations.pdf
Known Implementations:
- cscotta/recordinality - Java
Distribution Functions
===
Dynamic Histograms: Capturing Evolving Data Sets - Donko Donjerkovic, Yannis Ioannidis, Raghu Ramakrishnan
distribution_functions/dynamic-histograms.pdf
Known Implementations:
- bigmlcom/histogram - Clojure
- bmizerany/perks - Go
- d2fn/shades-rb - Ruby
===
The P<sup>2</sup> Algorithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations - Raj Jain, IMRICH CHLAMTAC
distribution_functions/psqr.pdf
Known Implementations:
- GNU Scientific Library - C
- scassidy/livestats (bitbucket) - Python
- absmall/p2 - C++
- jacksonicson/psquared - java
===
Effective Computation of Biased Quantiles over Data Streams: Cormode, Korn, Muthukrishnan, Srivastava
distribution_functions/bquant.pdf
Known Implementations:
- bmizerany/perks - Go
===
Summary Statistics
Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments - Pilippe Pebay
Summary Statistics/one_pass_moments_Pebay.pdf
Known Implementations: Kitware/VTK (mirror) - C++ (check in filters/statistics/vtkStatisticsAlgorithm.h)