topk

Sliding-window and regular top-K sketches.

A fast implementation of the HeavyKeeper top-K sketch inspired by the segmentio implementation and RedisBloom implementation. Significantly faster (~1.5x) than segmentio/topk on small sketches (k <= 1000) and much faster (10x-90x) on large sketches (k >= 10000).
A sliding-window top-K sketch, also based on HeavyKeeper, as described in "A Sketch Framework for Approximate Data Stream Processing in Sliding Windows"

import (
	"github.com/keilerkonzept/topk" // plain sketch
	"github.com/keilerkonzept/topk/sliding" // sliding-window sketch
)

Demo application: top K requesting IPs within a sliding time window from a web server access logs dataset

Examples

Top-K Sketch

package main

import (
	"log"
	"github.com/keilerkonzept/topk"
)

func main() {
	// make a new sketch keeping track of k=3 items using 1024x3 = 3072 buckets.
	sketch := topk.New(3, topk.WithWidth(1024), topk.WithDepth(3))

	log.Println("the sketch takes up", sketch.SizeBytes(), "bytes in memory")

	sketch.Incr("an item")            // count "an item" 1 time
	sketch.Add("an item", 123)        // count "an item" 123 times
	sketch.Add("another item", 4)     // count "another item" 4 times
	sketch.Add("an item", 5)          // count "an item" 5 more times
	sketch.Add("yet another item", 6) // count "yet another item" 6 times

	if sketch.Query("an item") {
		// "an item" is in the top K items observed within the last 60 ticks
	}

	_ = sketch.Count("another item") // return the estimated count for "another item"

	// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.
	for _, entry := range sketch.SortedSlice() {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}

	// Iter is an interator over the (*not* sorted) current top-K entries.
	for entry := range sketch.Iter {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}
	sketch.Reset() // reset to New() state
}

Sliding-window Top-K Sketch

package main

import (
	"log"
	"github.com/keilerkonzept/topk/sliding"
)

func main() {
	// make a new sketch keeping track of k=3 items over a window of the last 60 ticks
	// use width=1024 x depth=3 = 3072 buckets
	sketch := sliding.New(3, 60, sliding.WithWidth(1024), sliding.WithDepth(3))

	log.Println("the sketch takes up", sketch.SizeBytes(), "bytes in memory")

	sketch.Incr("an item")            // count "an item" 1 time
	sketch.Add("an item", 123)        // count "an item" 123 times
	sketch.Tick()                     // advance time by one tick
	sketch.Add("another item", 4)     // count "another item" 4 times
	sketch.Ticks(2)                   // advance time by two ticks
	sketch.Add("an item", 5)          // count "an item" 5 more times
	sketch.Add("yet another item", 6) // count "yet another item" 6 times

	if sketch.Query("an item") {
		// "an item" is in the top K items observed within the last 60 ticks
	}

	_ = sketch.Count("another item") // return the estimated count for "another item"

	// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.
	for _, entry := range sketch.SortedSlice() {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}

	// Iter is an interator over the (*not* sorted) current top-K entries.
	for entry := range sketch.Iter {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}
	sketch.Reset() // reset to New() state
}

Benchmarks

Top-K Sketch

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk
cpu: Apple M1 Pro

The Add benchmark performs random increments in the interval [1,10).

Operation	K	Depth	Width	time	allocs
`Add`	10	3	1024	358.6 ns/op	0 allocs/op
`Add`	10	3	8192	375.0 ns/op	0 allocs/op
`Add`	10	4	1024	449.9 ns/op	0 allocs/op
`Add`	10	4	8192	436.0 ns/op	0 allocs/op
`Add`	100	3	1024	371.5 ns/op	0 allocs/op
`Add`	100	3	8192	387.9 ns/op	0 allocs/op
`Add`	100	4	1024	452.3 ns/op	0 allocs/op
`Add`	100	4	8192	471.4 ns/op	0 allocs/op
`Incr`	10	3	1024	257.2 ns/op	0 allocs/op
`Incr`	10	3	8192	232.3 ns/op	0 allocs/op
`Incr`	10	4	1024	249.1 ns/op	0 allocs/op
`Incr`	10	4	8192	251.2 ns/op	0 allocs/op
`Incr`	100	3	1024	264.2 ns/op	0 allocs/op
`Incr`	100	3	8192	227.4 ns/op	0 allocs/op
`Incr`	100	4	1024	267.1 ns/op	0 allocs/op
`Incr`	100	4	8192	261.3 ns/op	0 allocs/op
`Count`	10	3	1024	216.0 ns/op	0 allocs/op
`Count`	10	3	8192	215.4 ns/op	0 allocs/op
`Count`	10	4	1024	220.0 ns/op	0 allocs/op
`Count`	10	4	8192	269.3 ns/op	0 allocs/op
`Count`	100	3	1024	235.1 ns/op	0 allocs/op
`Count`	100	3	8192	277.1 ns/op	0 allocs/op
`Count`	100	4	1024	278.7 ns/op	0 allocs/op
`Count`	100	4	8192	302.2 ns/op	0 allocs/op
`Query`	10	3	1024	129.6 ns/op	0 allocs/op
`Query`	10	3	8192	98.21 ns/op	0 allocs/op
`Query`	10	4	1024	129.9 ns/op	0 allocs/op
`Query`	10	4	8192	114.3 ns/op	0 allocs/op
`Query`	100	3	1024	141.2 ns/op	0 allocs/op
`Query`	100	3	8192	140.8 ns/op	0 allocs/op
`Query`	100	4	1024	131.1 ns/op	0 allocs/op
`Query`	100	4	8192	109.8 ns/op	0 allocs/op

Sliding-Window Top-K Sketch

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk/sliding
cpu: Apple M1 Pro

The Add benchmark performs random increments in the interval [1,10).

Operation	K	Depth	Width	Window size	History size	time	allocs
`Add`	10	3	1024	100	50	696.9 ns/op	0 allocs/op
`Add`	10	3	1024	100	100	1051 ns/op	0 allocs/op
`Add`	10	3	8192	100	50	784.9 ns/op	0 allocs/op
`Add`	10	3	8192	100	100	1146 ns/op	0 allocs/op
`Add`	100	3	1024	100	50	712.9 ns/op	0 allocs/op
`Add`	100	3	1024	100	100	1054 ns/op	0 allocs/op
`Add`	100	3	8192	100	50	763.3 ns/op	0 allocs/op
`Add`	100	3	8192	100	100	1139 ns/op	0 allocs/op
`Incr`	10	3	1024	100	50	434.9 ns/op	0 allocs/op
`Incr`	10	3	1024	100	100	560.7 ns/op	0 allocs/op
`Incr`	10	3	8192	100	50	501.1 ns/op	0 allocs/op
`Incr`	10	3	8192	100	100	728.7 ns/op	0 allocs/op
`Incr`	100	3	1024	100	50	425.6 ns/op	0 allocs/op
`Incr`	100	3	1024	100	100	580.0 ns/op	0 allocs/op
`Incr`	100	3	8192	100	50	497.8 ns/op	0 allocs/op
`Incr`	100	3	8192	100	100	746.2 ns/op	0 allocs/op
`Count`	10	3	1024	100	50	228.5 ns/op	0 allocs/op
`Count`	10	3	1024	100	100	209.3 ns/op	0 allocs/op
`Count`	10	3	8192	100	50	234.5 ns/op	0 allocs/op
`Count`	10	3	8192	100	100	230.7 ns/op	0 allocs/op
`Count`	100	3	1024	100	50	237.5 ns/op	0 allocs/op
`Count`	100	3	1024	100	100	242.8 ns/op	0 allocs/op
`Count`	100	3	8192	100	50	246.5 ns/op	0 allocs/op
`Count`	100	3	8192	100	100	243.4 ns/op	0 allocs/op
`Query`	10	3	1024	100	50	101.7 ns/op	0 allocs/op
`Query`	10	3	1024	100	100	104.8 ns/op	0 allocs/op
`Query`	10	3	8192	100	50	114.0 ns/op	0 allocs/op
`Query`	10	3	8192	100	100	114.5 ns/op	0 allocs/op
`Query`	100	3	1024	100	50	135.9 ns/op	0 allocs/op
`Query`	100	3	1024	100	100	118.5 ns/op	0 allocs/op
`Query`	100	3	8192	100	50	130.1 ns/op	0 allocs/op
`Query`	100	3	8192	100	100	131.5 ns/op	0 allocs/op
`Tick`	10	3	1024	100	50	4191 ns/op	0 allocs/op
`Tick`	10	3	1024	100	100	7010 ns/op	0 allocs/op
`Tick`	10	3	8192	100	50	28699 ns/op	0 allocs/op
`Tick`	10	3	8192	100	100	90979 ns/op	0 allocs/op
`Tick`	100	3	1024	100	50	6539 ns/op	0 allocs/op
`Tick`	100	3	1024	100	100	9343 ns/op	0 allocs/op
`Tick`	100	3	8192	100	50	31349 ns/op	0 allocs/op
`Tick`	100	3	8192	100	100	87488 ns/op	0 allocs/op

Comparison with segmentio/topk

Using benchstat:

$ go test -run='^$' -bench=BenchmarkSketchAddForComparison -count=10 | tee new.txt
$ go test -run='^$' -bench=BenchmarkSegmentioTopkSample -count=10 | tee old.txt
$ benchstat -row /K,/Depth,/Width,/Decay -col .name old.txt new.txt

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk
cpu: Apple M1 Pro

K	Depth	Width	Decay	`segmentio/topk` (sec/op)	this package (sec/op)	diff
10	3	256	0.6	641.0n ± 1%	373.5n ± 3%	-41.73% (p=0.000 n=10)
10	3	256	0.8	602.6n ± 1%	387.3n ± 2%	-35.73% (p=0.000 n=10)
10	3	256	0.9	550.4n ± 4%	431.3n ± 2%	-21.63% (p=0.000 n=10)
100	4	460	0.6	763.8n ± 2%	427.0n ± 1%	-44.09% (p=0.000 n=10)
100	4	460	0.8	720.9n ± 2%	459.1n ± 4%	-36.30% (p=0.000 n=10)
100	4	460	0.9	660.6n ± 3%	539.0n ± 22%	-18.41% (p=0.005 n=10)
1000	6	6907	0.6	1107.0n ± 2%	555.9n ± 8%	-49.79% (p=0.000 n=10)
1000	6	6907	0.8	1040.0n ± 4%	613.4n ± 2%	-41.02% (p=0.000 n=10)
1000	6	6907	0.9	936.5n ± 1%	731.5n ± 2%	-21.89% (p=0.000 n=10)
10000	9	92103	0.6	10.693µ ± 2%	1.058µ ± 2%	-90.11% (p=0.000 n=10)
10000	9	92103	0.8	10.667µ ± 1%	1.182µ ± 6%	-88.92% (p=0.000 n=10)
10000	9	92103	0.9	10.724µ ± 1%	1.288µ ± 2%	-87.98% (p=0.000 n=10)
100000	11	1151292	0.6	89.385µ ± 0%	1.674µ ± 1%	-98.13% (p=0.000 n=10)
100000	11	1151292	0.8	89.349µ ± 1%	1.708µ ± 1%	-98.09% (p=0.000 n=10)
100000	11	1151292	0.9	89.284µ ± 1%	1.705µ ± 1%	-98.09% (p=0.000 n=10)

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
heap		heap
internal/sizeof		internal/sizeof
sliding		sliding
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
hash.go		hash.go
options.go		options.go
sizeof.go		sizeof.go
sketch.go		sketch.go
sketch_bench_test.go		sketch_bench_test.go
sketch_test.go		sketch_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

topk

Contents

Examples

Top-K Sketch

Sliding-window Top-K Sketch

Benchmarks

Top-K Sketch

Sliding-Window Top-K Sketch

Comparison with segmentio/topk

About

Releases 9

Packages

Contributors 2

Languages

License

keilerkonzept/topk

Folders and files

Latest commit

History

Repository files navigation

topk

Contents

Examples

Top-K Sketch

Sliding-window Top-K Sketch

Benchmarks

Top-K Sketch

Sliding-Window Top-K Sketch

Comparison with segmentio/topk

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Languages

Packages