Awesome
Blake2Fast
These RFC 7693-compliant BLAKE2 implementations have been tuned for high speed and low memory usage. Span<byte>
is used throughout for lower memory overhead compared to byte[]
based APIs.
On modern .NET, Blake2Fast includes SIMD-accelerated (SSE2 - AVX-512) implementations of both BLAKE2b and BLAKE2s.
Installation
Blake2Fast is available on NuGet
PM> Install-Package SauceControl.Blake2Fast
Usage
All-at-Once Hashing
The simplest way to calculate a hash is the all-at-once ComputeHash
method.
var hash = Blake2b.ComputeHash(data);
BLAKE2 supports variable digest lengths from 1 to 32 bytes for BLAKE2s or 1 to 64 bytes for BLAKE2b.
var hash = Blake2b.ComputeHash(42, data);
BLAKE2 also natively supports keyed hashing.
var hash = Blake2b.ComputeHash(key, data);
Incremental Hashing
BLAKE2 hashes can be incrementally updated if you do not have the data available all at once.
async Task<byte[]> ComputeHashAsync(Stream data)
{
var hasher = Blake2b.CreateIncrementalHasher();
var buffer = ArrayPool<byte>.Shared.Rent(4096);
int bytesRead;
while ((bytesRead = await data.ReadAsync(buffer, 0, buffer.Length)) > 0)
hasher.Update(buffer.AsSpan(0, bytesRead));
ArrayPool<byte>.Shared.Return(buffer);
return hasher.Finish();
}
For convenience, the generic Update<T>()
method accepts any value type that does not contain reference fields, plus arrays and Spans of compatible types.
byte[] ComputeCompositeHash()
{
var hasher = Blake2b.CreateIncrementalHasher();
hasher.Update(42);
hasher.Update(Math.Pi);
hasher.Update("I love deadlines. I like the whooshing sound they make as they fly by.".AsSpan());
return hasher.Finish();
}
Be aware that the value passed to Update
is added to the hash state in its current memory layout, which may differ based on platform (endianness) or struct layout. Use care when calling Update
with types other than byte
if the computed hashes are to be used across application or machine boundaries.
For example, if you are adding a string to the hash state, you may hash the characters in memory layout as shown above, or you may use Encoding.GetBytes
to ensure the string bytes are handled consistently across platforms.
Allocation-Free Hashing
The output hash digest can be written to an existing buffer to avoid allocating a new array each time.
Span<byte> buffer = stackalloc byte[Blake2b.DefaultDigestLength];
Blake2b.ComputeAndWriteHash(data, buffer);
This is especially useful when performing an iterative hash, as might be used in a key derivation function.
byte[] DeriveBytes(string password, ReadOnlySpan<byte> salt)
{
// Create key from password, then hash the salt using the key
var pwkey = Blake2b.ComputeHash(Encoding.UTF8.GetBytes(password));
var hbuff = Blake2b.ComputeHash(pwkey, salt);
// Hash the hash lots of times, re-using the same buffer
for (int i = 0; i < 999_999; i++)
Blake2b.ComputeAndWriteHash(pwkey, hbuff, hbuff);
return hbuff;
}
System.Security.Cryptography Interop
For interoperating with code that uses System.Security.Cryptography
primitives, Blake2Fast can create a HashAlgorithm
wrapper. The wrapper inherits from HMAC
in case keyed hashing is required.
HashAlgorithm
is less efficient than the above methods, so use it only when necessary for compatibility.
byte[] WriteDataAndCalculateHash(byte[] data, string outFile)
{
using (var hashAlg = Blake2b.CreateHashAlgorithm())
using (var fileStream = new FileStream(outFile, FileMode.Create))
using (var cryptoStream = new CryptoStream(fileStream, hashAlg, CryptoStreamMode.Write))
{
cryptoStream.Write(data, 0, data.Length);
cryptoStream.FlushFinalBlock();
return hashAlg.Hash;
}
}
Benchmarks
Sample results from the Blake.Bench project. Benchmarks were run on the .NET Core 3.1 x64 runtime. Configuration below:
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.301
[Host] : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
ShortRun : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
Job=ShortRun IterationCount=3 LaunchCount=1 WarmupCount=3
Blake2Fast vs .NET in-box algorithms (MD5 and SHA2)
| Method | Data Length | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |------------:|----------------:|--------------:|-------------:|-------:|------:|------:|----------:|
| BLAKE2-256 | 3 | 106.2 ns | 8.01 ns | 0.44 ns | 0.0134 | - | - | 56 B |
| BLAKE2-512 | 3 | 144.2 ns | 30.51 ns | 1.67 ns | 0.0210 | - | - | 88 B |
| MD5 | 3 | 559.2 ns | 89.97 ns | 4.93 ns | 0.0496 | - | - | 208 B |
| SHA-256 | 3 | 722.7 ns | 61.84 ns | 3.39 ns | 0.0572 | - | - | 240 B |
| SHA-512 | 3 | 749.2 ns | 40.06 ns | 2.20 ns | 0.0725 | - | - | 304 B |
| | | | | | | | | |
| BLAKE2-256 | 3268 | 3,933.6 ns | 148.09 ns | 8.12 ns | 0.0076 | - | - | 56 B |
| BLAKE2-512 | 3268 | 2,429.7 ns | 107.58 ns | 5.90 ns | 0.0191 | - | - | 88 B |
| MD5 | 3268 | 5,866.8 ns | 171.88 ns | 9.42 ns | 0.0458 | - | - | 208 B |
| SHA-256 | 3268 | 12,719.1 ns | 559.17 ns | 30.65 ns | 0.0458 | - | - | 240 B |
| SHA-512 | 3268 | 7,577.3 ns | 555.80 ns | 30.47 ns | 0.0610 | - | - | 304 B |
| | | | | | | | | |
| BLAKE2-256 | 3145728 | 3,667,519.1 ns | 77,804.44 ns | 4,264.72 ns | - | - | - | 56 B |
| BLAKE2-512 | 3145728 | 2,240,879.0 ns | 101,729.66 ns | 5,576.15 ns | - | - | - | 88 B |
| MD5 | 3145728 | 5,108,604.6 ns | 189,941.46 ns | 10,411.33 ns | - | - | - | 208 B |
| SHA-256 | 3145728 | 11,038,065.4 ns | 311,623.07 ns | 17,081.11 ns | - | - | - | 240 B |
| SHA-512 | 3145728 | 6,599,771.6 ns | 251,528.85 ns | 13,787.15 ns | - | - | - | 304 B |
Note that the built-in cryptographic hash algorithms in .NET Core forward to platform-native libraries for their implementations. On Windows, this means the implementations are provided by Windows CNG. Performance may differ on Linux.
On .NET Framework, only scalar (not SIMD) implementations are available for both BLAKE2 algorithms. The scalar implementations outperform the built-in .NET algorithms in 64-bit applications, but they are slower for large input data on 32-bit. The SIMD implementations available in .NET Core are faster than the built-in algorithms on either processor architecture.
Blake2Fast vs other BLAKE2b implementations available on NuGet
| Method | Data Length | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |------------:|----------------:|-----------------:|----------------:|----------:|----------:|----------:|------------:|
| *Blake2Fast.Blake2b | 3 | 139.5 ns | 2.71 ns | 0.15 ns | 0.0076 | - | - | 32 B |
| Blake2Sharp(1) | 3 | 382.0 ns | 41.26 ns | 2.26 ns | 0.2065 | - | - | 864 B |
| ByteTerrace(2) | 3 | 442.5 ns | 40.06 ns | 2.20 ns | 0.1087 | - | - | 456 B |
| S.D.HashFunction(3) | 3 | 1,818.6 ns | 28.93 ns | 1.59 ns | 0.4158 | - | - | 1744 B |
| Konscious(4) | 3 | 1,234.3 ns | 23.67 ns | 1.30 ns | 0.2289 | - | - | 960 B |
| Isopoh(5) | 3 | 10,403,770.2 ns | 96,909,560.25 ns | 5,311,940.00 ns | 1736.0840 | 1722.4121 | 1722.4121 | 527973075 B |
| Blake2Core(6) | 3 | 1,407.4 ns | 137.05 ns | 7.51 ns | 0.2060 | - | - | 864 B |
| NSec(7) | 3 | 170.2 ns | 17.42 ns | 0.96 ns | 0.0267 | - | - | 112 B |
| | | | | | | | | |
| *Blake2Fast.Blake2b | 3268 | 2,413.4 ns | 48.19 ns | 2.64 ns | 0.0076 | - | - | 32 B |
| Blake2Sharp(1) | 3268 | 4,378.4 ns | 278.87 ns | 15.29 ns | 0.2060 | - | - | 864 B |
| ByteTerrace(2) | 3268 | 4,095.5 ns | 295.62 ns | 16.20 ns | 0.1068 | - | - | 456 B |
| S.D.HashFunction(3) | 3268 | 29,730.2 ns | 2,388.67 ns | 130.93 ns | 2.2278 | - | - | 9344 B |
| Konscious(4) | 3268 | 16,682.2 ns | 997.62 ns | 54.68 ns | 0.2136 | - | - | 960 B |
| Isopoh(5) | 3268 | 1,708,548.1 ns | 3,287,267.60 ns | 180,186.23 ns | 220.7031 | 218.7500 | 218.7500 | 67111641 B |
| Blake2Core(6) | 3268 | 20,619.3 ns | 1,859.13 ns | 101.90 ns | 0.1831 | - | - | 864 B |
| NSec(7) | 3268 | 2,459.1 ns | 252.85 ns | 13.86 ns | 0.0267 | - | - | 112 B |
| | | | | | | | | |
| *Blake2Fast.Blake2b | 3145728 | 2,242,018.9 ns | 156,659.45 ns | 8,587.03 ns | - | - | - | 32 B |
| Blake2Sharp(1) | 3145728 | 3,955,138.2 ns | 113,166.53 ns | 6,203.04 ns | - | - | - | 864 B |
| ByteTerrace(2) | 3145728 | 3,641,689.8 ns | 58,221.45 ns | 3,191.31 ns | - | - | - | 457 B |
| S.D.HashFunction(3) | 3145728 | 27,450,332.3 ns | 1,245,091.70 ns | 68,247.68 ns | 1781.2500 | - | - | 7472544 B |
| Konscious(4) | 3145728 | 15,179,139.1 ns | 668,577.20 ns | 36,646.97 ns | - | - | - | 960 B |
| Isopoh(5) | 3145728 | 4,011,376.3 ns | 477,836.99 ns | 26,191.86 ns | - | - | - | 984 B |
| Blake2Core(6) | 3145728 | 18,704,691.7 ns | 1,247,107.98 ns | 68,358.20 ns | - | - | - | 864 B |
| NSec(7) | 3145728 | 2,247,392.2 ns | 13,390.91 ns | 734.00 ns | - | - | - | 112 B |
- (1)
Blake2Sharp
is the reference C# BLAKE2b implementation from the official BLAKE2 repo. This version is not published to NuGet, so the source is included in the benchmark project directly. - (2)
ByteTerrace.Maths.Cryptography.Blake2
version 0.0.6. - (3)
System.Data.HashFunction.Blake2
version 2.0.0. BLAKE2b only. - (4)
Konscious.Security.Cryptography.Blake2
version 1.0.9. BLAKE2b only. - (5)
Isopoh.Cryptography.Blake2b
version 1.1.3. Yes, it really is that slow on incomplete block lengths. - (6)
Blake2Core
version 1.0.0. This package contains the reference Blake2Sharp code compiled as a debug (unoptimized) build. BenchmarkDotNet errors in such cases, so the settings were overridden to allow this library to run. - (7)
NSec.Cryptography
20.2.0. This implementation of BLAKE2b is not RFC-compliant in that it does not support digest sizes less than 32 bytes or keyed hashing. NSec.Cryptography wraps the nativelibsodium
library, which contains an AVX2 implementation of BLAKE2b.
Blake2Fast vs other BLAKE2s implementations available on NuGet
| Method | Data Length | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |------------:|---------------:|--------------:|-------------:|-------:|------:|------:|----------:|
| *Blake2Fast.Blake2s | 3 | 106.5 ns | 2.30 ns | 0.13 ns | 0.0076 | - | - | 32 B |
| Blake2s-net(1) | 3 | 274.4 ns | 39.08 ns | 2.14 ns | 0.1278 | - | - | 536 B |
| ByteTerrace(2) | 3 | 303.6 ns | 5.69 ns | 0.31 ns | 0.0763 | - | - | 320 B |
| | | | | | | | | |
| *Blake2Fast.Blake2s | 3268 | 3,941.2 ns | 388.64 ns | 21.30 ns | 0.0076 | - | - | 32 B |
| Blake2s-net(1) | 3268 | 6,044.0 ns | 251.18 ns | 13.77 ns | 0.1221 | - | - | 536 B |
| ByteTerrace(2) | 3268 | 6,287.7 ns | 715.20 ns | 39.20 ns | 0.0763 | - | - | 320 B |
| | | | | | | | | |
| *Blake2Fast.Blake2s | 3145728 | 3,669,570.7 ns | 308,040.39 ns | 16,884.73 ns | - | - | - | 32 B |
| Blake2s-net(1) | 3145728 | 5,549,277.3 ns | 171,690.31 ns | 9,410.93 ns | - | - | - | 536 B |
| ByteTerrace(2) | 3145728 | 5,754,080.2 ns | 75,019.78 ns | 4,112.09 ns | - | - | - | 320 B |
- (1)
blake2s-net
version 0.1.0. This is a conversion of the reference Blake2Sharp code to support BLAKE2s. - (2)
ByteTerrace.Maths.Cryptography.Blake2
version 0.0.6.
You can find more detailed comparisons between Blake2Fast and other .NET BLAKE2 implementations starting here. The short version is that Blake2Fast is the fastest and lowest-memory version of RFC-compliant BLAKE2 available for .NET.