Home

Awesome

topk

Coverage

Go Reference Go Report Card Awesome Go

Sliding-window and regular top-K sketches.

import (
	"github.com/keilerkonzept/topk" // plain sketch
	"github.com/keilerkonzept/topk/sliding" // sliding-window sketch
)

Demo application: top K requesting IPs within a sliding time window from a web server access logs dataset

<p> <img src="https://www.keilerkonzept.com/sliding-topk-demo.gif" width="100%" alt="Sliding Top-K Demo Application"> </p>

Contents

Examples

Top-K Sketch

package main

import (
	"log"
	"github.com/keilerkonzept/topk"
)

func main() {
	// make a new sketch keeping track of k=3 items using 1024x3 = 3072 buckets.
	sketch := topk.New(3, topk.WithWidth(1024), topk.WithDepth(3))

	log.Println("the sketch takes up", sketch.SizeBytes(), "bytes in memory")

	sketch.Incr("an item")            // count "an item" 1 time
	sketch.Add("an item", 123)        // count "an item" 123 times
	sketch.Add("another item", 4)     // count "another item" 4 times
	sketch.Add("an item", 5)          // count "an item" 5 more times
	sketch.Add("yet another item", 6) // count "yet another item" 6 times

	if sketch.Query("an item") {
		// "an item" is in the top K items observed within the last 60 ticks
	}

	_ = sketch.Count("another item") // return the estimated count for "another item"

	// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.
	for _, entry := range sketch.SortedSlice() {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}

	// Iter is an interator over the (*not* sorted) current top-K entries.
	for entry := range sketch.Iter {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}
	sketch.Reset() // reset to New() state
}

Sliding-window Top-K Sketch

package main

import (
	"log"
	"github.com/keilerkonzept/topk/sliding"
)

func main() {
	// make a new sketch keeping track of k=3 items over a window of the last 60 ticks
	// use width=1024 x depth=3 = 3072 buckets
	sketch := sliding.New(3, 60, sliding.WithWidth(1024), sliding.WithDepth(3))

	log.Println("the sketch takes up", sketch.SizeBytes(), "bytes in memory")

	sketch.Incr("an item")            // count "an item" 1 time
	sketch.Add("an item", 123)        // count "an item" 123 times
	sketch.Tick()                     // advance time by one tick
	sketch.Add("another item", 4)     // count "another item" 4 times
	sketch.Ticks(2)                   // advance time by two ticks
	sketch.Add("an item", 5)          // count "an item" 5 more times
	sketch.Add("yet another item", 6) // count "yet another item" 6 times

	if sketch.Query("an item") {
		// "an item" is in the top K items observed within the last 60 ticks
	}

	_ = sketch.Count("another item") // return the estimated count for "another item"

	// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.
	for _, entry := range sketch.SortedSlice() {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}

	// Iter is an interator over the (*not* sorted) current top-K entries.
	for entry := range sketch.Iter {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}
	sketch.Reset() // reset to New() state
}

Benchmarks

Top-K Sketch

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk
cpu: Apple M1 Pro

The Add benchmark performs random increments in the interval [1,10).

OperationKDepthWidthtimebytesallocs
Add1031024358.6 ns/op0 B/op0 allocs/op
Add1038192375.0 ns/op0 B/op0 allocs/op
Add1041024449.9 ns/op0 B/op0 allocs/op
Add1048192436.0 ns/op0 B/op0 allocs/op
Add10031024371.5 ns/op0 B/op0 allocs/op
Add10038192387.9 ns/op0 B/op0 allocs/op
Add10041024452.3 ns/op0 B/op0 allocs/op
Add10048192471.4 ns/op0 B/op0 allocs/op
Incr1031024257.2 ns/op0 B/op0 allocs/op
Incr1038192232.3 ns/op0 B/op0 allocs/op
Incr1041024249.1 ns/op0 B/op0 allocs/op
Incr1048192251.2 ns/op0 B/op0 allocs/op
Incr10031024264.2 ns/op0 B/op0 allocs/op
Incr10038192227.4 ns/op0 B/op0 allocs/op
Incr10041024267.1 ns/op0 B/op0 allocs/op
Incr10048192261.3 ns/op0 B/op0 allocs/op
Count1031024216.0 ns/op0 B/op0 allocs/op
Count1038192215.4 ns/op0 B/op0 allocs/op
Count1041024220.0 ns/op0 B/op0 allocs/op
Count1048192269.3 ns/op0 B/op0 allocs/op
Count10031024235.1 ns/op0 B/op0 allocs/op
Count10038192277.1 ns/op0 B/op0 allocs/op
Count10041024278.7 ns/op0 B/op0 allocs/op
Count10048192302.2 ns/op0 B/op0 allocs/op
Query1031024129.6 ns/op0 B/op0 allocs/op
Query103819298.21 ns/op0 B/op0 allocs/op
Query1041024129.9 ns/op0 B/op0 allocs/op
Query1048192114.3 ns/op0 B/op0 allocs/op
Query10031024141.2 ns/op0 B/op0 allocs/op
Query10038192140.8 ns/op0 B/op0 allocs/op
Query10041024131.1 ns/op0 B/op0 allocs/op
Query10048192109.8 ns/op0 B/op0 allocs/op

Sliding-Window Top-K Sketch

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk/sliding
cpu: Apple M1 Pro

The Add benchmark performs random increments in the interval [1,10).

OperationKDepthWidthWindow sizeHistory sizetimebytesallocs
Add103102410050696.9 ns/op0 B/op0 allocs/op
Add10310241001001051 ns/op0 B/op0 allocs/op
Add103819210050784.9 ns/op0 B/op0 allocs/op
Add10381921001001146 ns/op0 B/op0 allocs/op
Add1003102410050712.9 ns/op0 B/op0 allocs/op
Add100310241001001054 ns/op0 B/op0 allocs/op
Add1003819210050763.3 ns/op0 B/op0 allocs/op
Add100381921001001139 ns/op0 B/op0 allocs/op
Incr103102410050434.9 ns/op0 B/op0 allocs/op
Incr1031024100100560.7 ns/op0 B/op0 allocs/op
Incr103819210050501.1 ns/op0 B/op0 allocs/op
Incr1038192100100728.7 ns/op0 B/op0 allocs/op
Incr1003102410050425.6 ns/op0 B/op0 allocs/op
Incr10031024100100580.0 ns/op0 B/op0 allocs/op
Incr1003819210050497.8 ns/op0 B/op0 allocs/op
Incr10038192100100746.2 ns/op0 B/op0 allocs/op
Count103102410050228.5 ns/op0 B/op0 allocs/op
Count1031024100100209.3 ns/op0 B/op0 allocs/op
Count103819210050234.5 ns/op0 B/op0 allocs/op
Count1038192100100230.7 ns/op0 B/op0 allocs/op
Count1003102410050237.5 ns/op0 B/op0 allocs/op
Count10031024100100242.8 ns/op0 B/op0 allocs/op
Count1003819210050246.5 ns/op0 B/op0 allocs/op
Count10038192100100243.4 ns/op0 B/op0 allocs/op
Query103102410050101.7 ns/op0 B/op0 allocs/op
Query1031024100100104.8 ns/op0 B/op0 allocs/op
Query103819210050114.0 ns/op0 B/op0 allocs/op
Query1038192100100114.5 ns/op0 B/op0 allocs/op
Query1003102410050135.9 ns/op0 B/op0 allocs/op
Query10031024100100118.5 ns/op0 B/op0 allocs/op
Query1003819210050130.1 ns/op0 B/op0 allocs/op
Query10038192100100131.5 ns/op0 B/op0 allocs/op
Tick1031024100504191 ns/op0 B/op0 allocs/op
Tick10310241001007010 ns/op0 B/op0 allocs/op
Tick10381921005028699 ns/op0 B/op0 allocs/op
Tick103819210010090979 ns/op0 B/op0 allocs/op
Tick10031024100506539 ns/op0 B/op0 allocs/op
Tick100310241001009343 ns/op0 B/op0 allocs/op
Tick100381921005031349 ns/op0 B/op0 allocs/op
Tick1003819210010087488 ns/op0 B/op0 allocs/op

Comparison with segmentio/topk

Using benchstat:

$ go test -run='^$' -bench=BenchmarkSketchAddForComparison -count=10 | tee new.txt
$ go test -run='^$' -bench=BenchmarkSegmentioTopkSample -count=10 | tee old.txt
$ benchstat -row /K,/Depth,/Width,/Decay -col .name old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk
cpu: Apple M1 Pro
KDepthWidthDecaysegmentio/topk (sec/op)this package (sec/op)diff
1032560.6641.0n ± 1%373.5n ± 3%-41.73% (p=0.000 n=10)
1032560.8602.6n ± 1%387.3n ± 2%-35.73% (p=0.000 n=10)
1032560.9550.4n ± 4%431.3n ± 2%-21.63% (p=0.000 n=10)
10044600.6763.8n ± 2%427.0n ± 1%-44.09% (p=0.000 n=10)
10044600.8720.9n ± 2%459.1n ± 4%-36.30% (p=0.000 n=10)
10044600.9660.6n ± 3%539.0n ± 22%-18.41% (p=0.005 n=10)
1000669070.61107.0n ± 2%555.9n ± 8%-49.79% (p=0.000 n=10)
1000669070.81040.0n ± 4%613.4n ± 2%-41.02% (p=0.000 n=10)
1000669070.9936.5n ± 1%731.5n ± 2%-21.89% (p=0.000 n=10)
100009921030.610.693µ ± 2%1.058µ ± 2%-90.11% (p=0.000 n=10)
100009921030.810.667µ ± 1%1.182µ ± 6%-88.92% (p=0.000 n=10)
100009921030.910.724µ ± 1%1.288µ ± 2%-87.98% (p=0.000 n=10)
1000001111512920.689.385µ ± 0%1.674µ ± 1%-98.13% (p=0.000 n=10)
1000001111512920.889.349µ ± 1%1.708µ ± 1%-98.09% (p=0.000 n=10)
1000001111512920.989.284µ ± 1%1.705µ ± 1%-98.09% (p=0.000 n=10)