Home

Awesome

BLAKE3

<p> <a href="https://pkg.go.dev/github.com/zeebo/blake3"><img src="https://img.shields.io/badge/doc-reference-007d9b?logo=go&style=flat-square" alt="go.dev" /></a> <a href="https://goreportcard.com/report/github.com/zeebo/blake3"><img src="https://goreportcard.com/badge/github.com/zeebo/blake3?style=flat-square" alt="Go Report Card" /></a> <a href="https://sourcegraph.com/github.com/zeebo/blake3?badge"><img src="https://sourcegraph.com/github.com/zeebo/blake3/-/badge.svg?style=flat-square" alt="SourceGraph" /></a> </p>

Pure Go implementation of BLAKE3 with AVX2 and SSE4.1 acceleration.

Special thanks to the excellent avo making writing vectorized version much easier.

Benchmarks

Caveats

This library makes some different design decisions than the upstream Rust crate around internal buffering. Specifically, because it does not target the embedded system space, nor does it support multithreading, it elects to do its own internal buffering. This means that a user does not have to worry about providing large enough buffers to get the best possible performance, but it does worse on smaller input sizes. So some notes:

Charts

In this case, both libraries are able to avoid a lot of data copying and will use vectorized instructions to hash as fast as possible, and perform similarly.

Large Full Buffer

For incremental writes, you must provide the Rust version large enough buffers so that it can use vectorized instructions. This Go library performs consistently regardless of the size being sent into the update function.

Incremental

The downside of internal buffering is most apparent with small sizes as most time is spent initializing the hasher state. In terms of hashing rate, the difference is 3-4x, but in an absolute sense it's ~100ns (see tables below). If you wish to hash a large number of very small strings and you care about those nanoseconds, be sure to use the Reset method to avoid re-initializing the state.

Small Full Buffer

Timing Tables

Small

SizeFull BufferResetFull Buffer RateReset Rate
64 b205ns86.5ns312MB/s740MB/s
256 b364ns250ns703MB/s1.03GB/s
512 b575ns468ns892MB/s1.10GB/s
768 b795ns682ns967MB/s1.13GB/s

Large

SizeIncrementalFull BufferResetIncremental RateFull Buffer RateReset Rate
1 kib1.02µs1.01µs891ns1.00GB/s1.01GB/s1.15GB/s
2 kib2.11µs2.07µs1.95µs968MB/s990MB/s1.05GB/s
4 kib2.28µs2.15µs2.05µs1.80GB/s1.90GB/s2.00GB/s
8 kib2.64µs2.52µs2.44µs3.11GB/s3.25GB/s3.36GB/s
16 kib4.93µs4.54µs4.48µs3.33GB/s3.61GB/s3.66GB/s
32 kib9.41µs8.62µs8.54µs3.48GB/s3.80GB/s3.84GB/s
64 kib18.2µs16.7µs16.6µs3.59GB/s3.91GB/s3.94GB/s
128 kib36.3µs32.9µs33.1µs3.61GB/s3.99GB/s3.96GB/s
256 kib72.5µs65.7µs66.0µs3.62GB/s3.99GB/s3.97GB/s
512 kib145µs131µs132µs3.60GB/s4.00GB/s3.97GB/s
1024 kib290µs262µs262µs3.62GB/s4.00GB/s4.00GB/s

No ASM

SizeIncrementalFull BufferResetIncremental RateFull Buffer RateReset Rate
64 b253ns254ns134ns253MB/s252MB/s478MB/s
256 b553ns557ns441ns463MB/s459MB/s580MB/s
512 b948ns953ns841ns540MB/s538MB/s609MB/s
768 b1.38µs1.40µs1.35µs558MB/s547MB/s570MB/s
1 kib1.77µs1.77µs1.70µs577MB/s580MB/s602MB/s
1024 kib880µs883µs878µs596MB/s595MB/s598MB/s

The speed caps out at around 1 kib, so most rows have been elided from the presentation.