Home

Awesome

FSharp.HashCollections

Persistent hash based map implementation

Made for my own purposes the goal is to allow faster lookup table performance in F#. After testing some the built in F# Map and other collection implementations across the .NET ecosystem and not being totally satisfied with either the performance and/or the API compromises exposed I decided to code my own for both fun and profit. This is the end result implemented as a customised/optimised HAMT (Hash Mapped Array Trie) for the .NET runtime.

NuGet Badge

Goals

  1. More efficient persistent collection type where F#'s Map type isn't fast enough.
    • At time of writing this was the fastest immutable collections library in the .NET ecosystem I could find (including BCL classes). See Performance Benchmarks for details.
    • Tailor the algorithms used to the .NET ecosystem to achieve better performance.
  2. Allow a range of key types and equality logic to be used even if provided by consumers without sacrificing performance (e.g. no need for keys to be comparable).
    • Custom equality comparers can be provided, unlike the standard F# Map/Set data types.
  3. Provide an idiomatic API to F#. The library ideally should allow C# usage/interop if required.
    • HashMap and HashSet static classes are usable from C# as wrappers. Performance optimisations (e.g. inlining) are applied at compile time where possible.
  4. Maintainable to an average F# developer.
    • For example minimising inheritance/object hierarchies and casting (unlike some other impl's I've seen), performant whilst still idiomatic code, etc.
    • Use F# strengths to increase performance further (e.g. inlining + DU's to avoid method calls and copying overhead affecting struct key performance).
  5. Performance at least at the same scale as other languages of the same class/abstraction level (e.g JVM, etc).

TLDR; Benefits of immutable collections while minimising the cost (e.g. performance, maintainable code, idiomatic code, etc).

Use Cases

Collection Types Provided

All collections are persisted/immutable by nature so any Add/Remove operation produces a new collection instance. Most methods mirror the F# built in Map and Set module (e.g. Map.tryFind vs HashMap.tryFind) allowing in many cases this library to be used as a drop in replacement where appropriate.

OperationComplexity
TryFindO(log32n)
AddO(log32n)
RemoveO(log32n)
CountO(1)
EqualsO(n)

Example Usage:

open FSharp.HashCollections
let hashMapResult = HashMap.empty |> HashMap.add k v |> HashMay.tryFind k // Result is ValueSome(v)
OperationComplexity
ContainsO(log32n)
AddO(log32n)
RemoveO(log32n)
CountO(1)
EqualsO(n)

Example Usage:

open FSharp.HashCollections
let hashMapResult = HashSet.empty |> HashSet.add k |> HashSet.contains k // Result is true

Equality customisation

By default any "empty" seeded HashSet/Map uses F# HashIdentity.Structural comparison to determine equality and calculate hashcodes. This is in line with the F# collection types.

In addition all collection types allow custom equality to be assigned if required. The equality is encoded as a type on the collection so equality and hashing operations are consistent. Same collection types with different equality comparers can not be used interchangeably by operations provided. To use:

// Uses the default equality template provided.
let defaultIntHashMap : HashMap<int, int64> = HashMap.empty

// Uses a custom equality template provided by the type parameter.
let customEqIntHashMap : HashMap<int, int64, CustomEqualityComparer> = HashMap.emptyWithComparer

Any equality comparer specified in the type signature must:

An example with the type "Int" for the custom equality comparer (which in testing exhibits slightly faster perf than the default):

type IntEqualityTemplate =
    struct end
    interface System.Collections.Generic.IEqualityComparer<int> with
        member __.Equals(x: int, y: int): bool = x = y
        member __.GetHashCode(obj: int): int = hash obj // Or just obj

module Usage =
    // Type is HashMap<int, int64, IntEqualityTemplate>
    let empty = HashMap.emptyWithComparer<_, int64, IntEqualityTemplate>

Performance Benchmarks

TryFind on HashMap

Keys are of type int32 where "GetHashMap" represents this library's HashMap collection.

All are using F# HashIdentity.Structural comparison.

See ReadBenchmark


BenchmarkDotNet=v0.13.1, OS=arch 
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.403
  [Host]     : .NET 6.0.11 (6.0.1122.52304), X64 RyuJIT DEBUG
  DefaultJob : .NET 6.0.11 (6.0.1122.52304), X64 RyuJIT


MethodCollectionSizeMeanErrorStdDev
GetHashMap1011.49 ns0.091 ns0.081 ns
GetImToolsHashMap1031.58 ns0.190 ns0.169 ns
GetFSharpMap1026.97 ns0.208 ns0.194 ns
GetFSharpXHashMap10129.21 ns0.806 ns0.754 ns
GetSystemCollectionsImmutableMap1021.80 ns0.325 ns0.304 ns
GetFSharpDataAdaptiveMap1022.95 ns0.234 ns0.219 ns
GetFSharpxChampMap1072.75 ns0.439 ns0.367 ns
GetLangExtMap1024.83 ns0.218 ns0.193 ns
GetHashMap10011.75 ns0.120 ns0.112 ns
GetImToolsHashMap10044.99 ns0.358 ns0.317 ns
GetFSharpMap10047.45 ns0.418 ns0.391 ns
GetFSharpXHashMap100148.55 ns0.919 ns0.815 ns
GetSystemCollectionsImmutableMap10033.26 ns0.246 ns0.230 ns
GetFSharpDataAdaptiveMap10044.84 ns0.400 ns0.374 ns
GetFSharpxChampMap10083.54 ns1.017 ns0.951 ns
GetLangExtMap10031.26 ns0.264 ns0.247 ns
GetHashMap100011.67 ns0.090 ns0.084 ns
GetImToolsHashMap100069.63 ns0.974 ns0.911 ns
GetFSharpMap100071.16 ns0.762 ns0.713 ns
GetFSharpXHashMap1000161.89 ns0.539 ns0.450 ns
GetSystemCollectionsImmutableMap100049.93 ns0.489 ns0.458 ns
GetFSharpDataAdaptiveMap100067.05 ns0.725 ns0.678 ns
GetFSharpxChampMap100084.44 ns1.061 ns0.992 ns
GetLangExtMap100031.00 ns0.312 ns0.292 ns
GetHashMap10000033.47 ns0.376 ns0.314 ns
GetImToolsHashMap100000216.79 ns4.285 ns4.009 ns
GetFSharpMap100000177.11 ns2.074 ns1.940 ns
GetFSharpXHashMap100000231.84 ns4.178 ns6.627 ns
GetSystemCollectionsImmutableMap100000162.46 ns2.537 ns2.374 ns
GetFSharpDataAdaptiveMap100000154.96 ns2.743 ns2.566 ns
GetFSharpxChampMap100000114.73 ns1.243 ns1.163 ns
GetLangExtMap10000061.52 ns0.609 ns0.570 ns
GetHashMap50000035.67 ns0.622 ns0.582 ns
GetImToolsHashMap500000528.33 ns10.475 ns12.063 ns
GetFSharpMap500000327.39 ns6.410 ns11.881 ns
GetFSharpXHashMap500000336.89 ns5.012 ns4.688 ns
GetSystemCollectionsImmutableMap500000359.41 ns7.174 ns11.787 ns
GetFSharpDataAdaptiveMap500000324.83 ns6.091 ns5.982 ns
GetFSharpxChampMap500000122.62 ns1.997 ns2.137 ns
GetLangExtMap50000067.00 ns1.128 ns1.055 ns
GetHashMap75000038.42 ns0.513 ns0.428 ns
GetImToolsHashMap750000639.63 ns12.341 ns14.691 ns
GetFSharpMap750000429.89 ns8.344 ns10.247 ns
GetFSharpXHashMap750000446.19 ns5.108 ns4.778 ns
GetSystemCollectionsImmutableMap750000440.76 ns8.714 ns10.035 ns
GetFSharpDataAdaptiveMap750000364.25 ns7.225 ns6.758 ns
GetFSharpxChampMap750000128.73 ns2.571 ns3.848 ns
GetLangExtMap75000072.10 ns1.426 ns1.334 ns
GetHashMap100000041.58 ns0.804 ns0.752 ns
GetImToolsHashMap1000000716.12 ns13.576 ns12.699 ns
GetFSharpMap1000000478.64 ns9.391 ns11.877 ns
GetFSharpXHashMap1000000426.63 ns4.339 ns4.059 ns
GetSystemCollectionsImmutableMap1000000506.48 ns9.872 ns9.695 ns
GetFSharpDataAdaptiveMap1000000395.59 ns5.966 ns5.581 ns
GetFSharpxChampMap1000000132.05 ns2.585 ns4.319 ns
GetLangExtMap100000079.30 ns1.577 ns2.408 ns
GetHashMap5000000155.94 ns0.957 ns0.799 ns
GetImToolsHashMap50000001,138.25 ns18.869 ns17.650 ns
GetFSharpMap5000000823.41 ns10.984 ns10.274 ns
GetFSharpXHashMap5000000475.15 ns6.462 ns6.044 ns
GetSystemCollectionsImmutableMap5000000826.09 ns15.213 ns14.230 ns
GetFSharpDataAdaptiveMap5000000579.94 ns6.645 ns6.216 ns
GetFSharpxChampMap5000000285.59 ns3.235 ns3.026 ns
GetLangExtMap5000000234.92 ns3.571 ns3.341 ns
GetHashMap10000000159.87 ns2.199 ns1.950 ns
GetImToolsHashMap100000001,345.84 ns26.844 ns26.364 ns
GetFSharpMap10000000957.82 ns14.825 ns13.868 ns
GetFSharpXHashMap10000000509.87 ns9.244 ns8.647 ns
GetSystemCollectionsImmutableMap10000000960.39 ns18.921 ns17.699 ns
GetFSharpDataAdaptiveMap10000000672.11 ns9.232 ns8.636 ns
GetFSharpxChampMap10000000291.37 ns3.615 ns3.381 ns
GetLangExtMap10000000237.09 ns3.173 ns2.968 ns

Add on HashMap

Scenario: Adding 50 elements on top of a pre-defined collection with a collection size as specified, average time of each of the 50 inserts:

See AddBenchmark

OfSeq on HashMap

See OfSeq Benchmark

An outside .NET ecosystem comparison.

This section is simply a guide to give a ballpark comparison figure on performance with implementations from other languages that have standard HAMT implementations for my own technical selection evaluation.

TL;DR: The "Get" method where performance is significantly better as the collection scales up. For example at 10,000,000 FSharp collection is approx 3.59 faster, and 1.73x faster at building the initial hashmap.

LangOperation100100010000100000500000100000050000001000000050000000
F#TryFind0005379449511429346
ScalaTryFind003171112561863411132318
F#OfSeq003301462191321308019160
ScalaOfSeq005491633872516534743827

Platforms tested: Dotnet version: 5.0.301, Scala code runner version 2.13.6-20210529-211702.

Note that most of the optimisation work I have done is around the "Get" method given my use case (lots of reads, fewer but still significant write load with large collections 500,000+ items).

Design decisions that may affect consumers of this library

Any of these decisions may change in the future as I gain knowledge, change my mind, etc. It doesn't list all the tweaks, and changes caused by benchmarking just the things that affect consumers. Many besides equality checking shouldn't affect the API dramatically; and if they do it should remain easy to port code to the new API as appropriate.

  1. Writing in F# vs C#

    • Performance tweaks found in my trial and error experiments (structs, inlining, etc) are easier to unlock and use in F# requiring less code. Inlining is used for algorithm sharing across collection types, avoiding struct copying with returned results, etc.
  2. Count is a O(1) operation. This requires an extra bit of space per tree and minor overhead during insert and removal but allows other operations on the read side to be faster (e.g isEmpty, count, intersect, etc.).

    • ✅ Lower time complexity for existence and count operations.
    • ❌ Slightly more work required when inserting and removing elements keeping track of addition or removal success.
  3. NetCoreApp3.1 only or greater. This allows the use of .NET intrinsics and other performance enhancements.

    • ✅ Faster implementation.
    • ❌ Only Net Core 3.1 compatible or greater.