Home

Awesome

FARSH stands for Fast and Reliable (but not Secure) 32-bit Hash. While established new speed records and successfully passed the SMHasher testsuite, it's not as reliable as the competition. Discussion and additional benchmarks.

Features / to-do list

API

Internals

The current FARSH version combines two hashing algorithms.

Low-level hashing algorithm splits all input data into 1024-byte blocks and computes hash value for every block. It's the very short cycle borrowed from UHASH that combines 1024 bytes of input data with 1024 bytes of key material. The hash value returned by this cycle is 64-bit long, and UMAC thesis proved that it has 32 bits of entropy. So the low-level algorithm compresses each 1024-byte block of input data into 64-bit value carrying 32 bits of entropy.

High-level hashing algorithm is a stripped-down version of xxHash64. It receives sequence of 64-bit values from the previous level and combines them into final 32-bit hash result. Since the original xxHash64 algorithm successfully passes all SMHasher tests while computing 64-bit hash from raw data, it's no surprise that modified algorithm is able to compute high-quality 32-bit hash from the sequence of numbers each carrying 32 bits of entropy.

The power of the FARSH algorithm comes from its inner cycle, that is very short (read: fast) and allows highly-parallel implementations, so it can fully exploit power of multi-core, SIMD, VLIW and SIMT (GPU) architectures. At the same time, there is math proof that it can deliver 32 bits of entropy so we can use it without any doubts.

Universal hashing

Main loop uses universal hashing formula from UMAC with a precomputed key material of 1024 bytes (plus 512 bytes for longer hashes). FARSH is essentially UHASH with higher-level hashing algorithms replaced with simpler non-cryptographic one. The universal hashing formula used here (and copied intact from UMAC) is as simple as

    uint64_t sum = 0;  uint32_t *data, *key;
    for (i=0; i < elements; i+=2)
        sum  +=  uint64_t(data[i] + key[i]) * (data[i+1] + key[i+1]);

The main loop

Benchmark

Benchmark measures overall hash speed as well as internal loop speed. The internal loop speed is a hard limit for the speed of any future FARSH version, while the overall speed includes time required for pretty slow high-level hashing. Future versions should replace it with faster algorithm still satisfying the SMHasher requirements, making overall hash speed within 10% of the internal loop speed.

Executables were compiled with GCC 4.9.2. Aligned versions make sure that data being hashed are 64-byte aligned, unaligned versions make sure that data are unaligned. This makes big difference on Core2 and older Intel CPUs.

Intel Haswell i7-4770 3.9 GHz (AVX2), other IvyBridge to Skylake CPUs has pretty close performance/GHz:

ExecutableFARSH 0.2 speedInternal loop speed
aligned-farsh-x64-avx254.536 GB/s = 50.790 GiB/s65.645 GB/s = 61.137 GiB/s
aligned-farsh-x6431.162 GB/s = 29.022 GiB/s35.722 GB/s = 33.269 GiB/s
aligned-farsh-x86-avx240.279 GB/s = 37.513 GiB/s61.682 GB/s = 57.446 GiB/s
aligned-farsh-x86-sse225.221 GB/s = 23.489 GiB/s33.584 GB/s = 31.277 GiB/s
aligned-farsh-x866.255 GB/s = 5.825 GiB/s6.336 GB/s = 5.901 GiB/s
farsh-x64-avx246.024 GB/s = 42.863 GiB/s64.967 GB/s = 60.505 GiB/s
farsh-x6430.335 GB/s = 28.252 GiB/s34.891 GB/s = 32.495 GiB/s
farsh-x86-avx235.273 GB/s = 32.851 GiB/s57.252 GB/s = 53.320 GiB/s
farsh-x86-sse224.502 GB/s = 22.820 GiB/s33.325 GB/s = 31.037 GiB/s
farsh-x866.283 GB/s = 5.852 GiB/s6.763 GB/s = 6.299 GiB/s

Intel Pentium M processor 1.5 GHz (SSE2):

ExecutableFARSH 0.2 speedInternal loop speed
aligned-farsh-x86-sse22.625 GB/s = 2.444 GiB/s2.791 GB/s = 2.5 GiB/s
aligned-farsh-x861.664 GB/s = 1.550 GiB/s1.946 GB/s = 1.8 GiB/s
farsh-x86-sse22.025 GB/s = 1.886 GiB/s2.302 GB/s = 2.1 GiB/s
farsh-x861.471 GB/s = 1.370 GiB/s1.715 GB/s = 1.5 GiB/s

K10: AMD Athlon II X2 220 Processor 2.8 GHz (SSE3):

ExecutableFARSH 0.2 speedInternal loop speed
aligned-farsh-x6411.300 GB/s = 10.524 GiB/s14.446 GB/s = 13.454 GiB/s
aligned-farsh-x86-sse210.899 GB/s = 10.151 GiB/s13.280 GB/s = 12.368 GiB/s
aligned-farsh-x863.805 GB/s = 3.544 GiB/s5.089 GB/s = 4.740 GiB/s
farsh-x6412.823 GB/s = 11.943 GiB/s14.187 GB/s = 13.212 GiB/s
farsh-x86-sse210.933 GB/s = 10.182 GiB/s12.389 GB/s = 11.538 GiB/s
farsh-x863.786 GB/s = 3.526 GiB/s5.825 GB/s = 5.425 GiB/s

Piledriver: AMD A8-5500 APU 3.7 GHz (AVX):

ExecutableFARSH 0.2 speedInternal loop speed
aligned-farsh-x6417.130 GB/s = 15.953 GiB/s21.394 GB/s = 19.924 GiB/s
aligned-farsh-x86-sse213.790 GB/s = 12.843 GiB/s20.830 GB/s = 19.400 GiB/s
aligned-farsh-x863.872 GB/s = 3.606 GiB/s5.457 GB/s = 5.082 GiB/s
farsh-x6415.313 GB/s = 14.262 GiB/s19.659 GB/s = 18.309 GiB/s
farsh-x86-sse213.812 GB/s = 12.863 GiB/s18.977 GB/s = 17.674 GiB/s
farsh-x863.959 GB/s = 3.687 GiB/s5.056 GB/s = 4.709 GiB/s

More results and benchmarking executables are available in those forum posts.

Competition

Fast non-cryptographic hashes:

Further reading:

MAC/PRF, i.e. cryprographically secure keyed hashes: