Awesome
Spacesaving
Simple algorithm to estimate distinct elements in an unbounded stream using bounded space. The estimate is the upper bound on the element's actual count.
Usage
Add it to you mix.exs
deps
{:spacesaving, "~> 0.0.2"}
Init with 3 spaces, so we track 3 elements
import Spacesaving
state = init(3)
Push some elements
state = state
|> push(:foo) |> push(:foo) |> push(:foo) |> push(:foo)
|> push(:bar) |> push(:bar) |> push(:bar)
|> push(:baz) |> push(:baz)
|> push(:buzz)
Get the top k elements
top(state, 2) # This will be [foo: 4, bar: 3]
top(state, 3) # This will be [foo: 4, bar: 3, buzz: 3], so the inaccuracy starts to come into play when an element is kicked out, and the estimate is the upper bound
Merge two states
left = init(4) |> push(:foo) |> push(:bar)
right = init(4) |> push(:foo) |> push(:baz)
merge(left, right)
|> top(3) # Would be [foo: 2, bar: 1, baz: 1]