Awesome

<p align="center"> <img src="z-order.png" alt="Z-Order Curve" height="250"> </p> <h1 align="center">Morton ND</h1>

A header-only Morton encode/decode library (C++14) capable of encoding from and decoding to N-dimensional space.

All algorithms are generated at compile-time for the number of dimensions and field width used. This way, loops and branches are not required.

Includes a hardware-based approach (using Intel BMI2) for most Intel CPUs, as well as another fast approach based on Lookup Table (LUT) methods for other CPU variants.

Status

Features

Encode/Decode Support

any number of dimensions (e.g. 2D, 3D, 4D ... ND).
built-in support for up to 128-bit native results (__uint128_t). Unlimited using a user-supplied "big integer" class (not yet tested).
constexpr encoding and decoding, allowing Morton coding to be expressed at compile-time.

Encoders and Decoders

Hardware (Intel BMI2)

Supports encoding and decoding in N dimensions, using Intel's BMI2 ISA extension (available in Haswell (Intel), Excavator (AMD) and newer).

See the Morton ND BMI2 Usage Guide for details.

using MortonND_4D = mortonnd::MortonNDBmi<4, uint64_t>;

// Encodes 4 fields into a uint64_t result.
auto encoding = MortonND_4D::Encode(f1, f2, f3, f4);

// Decodes 4 fields.
std::tie(f1, f2, f3, f4) = MortonND_4D::Decode(encoding);

Lookup Table (LUT)

Supports encoding and decoding in N dimensions, using compiler-generated LUTs.

LUTs are defined with constant expressions and thus can be generated (and even used) at compile-time.

Both the encoder and decoder support chunking, allowing a LUT smaller than the input field width when encoding, or smaller than the Morton code width when decoding, to be used internally. This is useful for applications which require faster compilation times and smaller binaries (at the expense of extra bit manipulation operations required to combine chunks at runtime).

See the Morton ND LUT Usage Guide for details.

Encoding

// Generates a 4D LUT encoder (4 fields, 16 bits each, 8-bit LUT) using the compiler.
constexpr auto MortonND_4D_Enc = mortonnd::MortonNDLutEncoder<4, 16, 8>();

// Encodes 4 fields. Can be done at run-time or compile-time.
auto encoding = MortonND_4D_Enc.Encode(f1, f2, f3, f4);

Decoding

// Generates a 4D LUT decoder (4 fields, 16 bits each, 8-bit LUT) using the compiler.
constexpr auto MortonND_4D_Dec = mortonnd::MortonNDLutDecoder<4, 16, 8>();

// Decodes 4 fields. Can be done at run-time or compile-time (just not with std::tie in C++14).
std::tie(f1, f2, f3, f4) = MortonND_4D_Dec.Decode(encoding);

Testing and Performance

Validation testing specific to MortonND is located in the /tests folder, covering N-dimensional configurations where N ∈ { 1, 2, 3, 4, 5, 8, 16, 32, 64 } for common field sizes, and is run as part of Travis CI.

Performance benchmark tests (and additional validation) for 2D and 3D use cases are located in a separate repository. See this fork of @Forceflow's Libmorton, which integrates Morton ND into Libmorton's existing test framework.

Benchmarks

The snippets below show performance comparisons between various 3D configurations of Morton ND. Comparisons to the 3D algorithms found in Libmorton are also included to demonstrate that Morton ND's generated algorithms are as efficient as hand-coded algorithms.

To run these tests (and more!) on your own machine, clone the fork linked above.

The following metrics (sorted by random access time, ascending) were collected on an I9-9980HK, compiled with GCC 11.1.0 on macOS 11.1 using -O3 -DNDEBUG. Results include data from both linearly increasing and random inputs to demonstrate the performance impact of cache (hit or miss) under each algorithm / configuration. Results are averaged over 5 runs (each algorithm is run 5 times consecutively before moving on to the next).

32-bit

++ Running each performance test 5 times and averaging results
++ Encoding 512^3 morton codes (134217728 in total)

    Linear      Random
    ======      ======
    524.178 ms  515.768 ms : 32-bit (MortonND)    LUT: 1 chunks, 10 bit LUT
    530.421 ms  536.726 ms : 32-bit (lib-morton)  BMI2 instruction set
    539.061 ms  530.954 ms : 32-bit (MortonND)    BMI2
    629.888 ms  628.029 ms : 32-bit (lib-morton)  LUT Shifted
    647.075 ms  646.281 ms : 32-bit (MortonND)    LUT: 2 chunks, 8 bit LUT
    658.859 ms  667.313 ms : 32-bit (MortonND)    LUT: 2 chunks, 5 bit LUT
    668.371 ms  665.673 ms : 32-bit (lib-morton)  LUT
    
++ Decoding 512^3 morton codes (134217728 in total)

    Linear      Random
    ======      ======
    560.478 ms  3159.143 ms : 32-bit (lib-morton)  BMI2 Instruction set
    569.632 ms  3188.762 ms : 32-bit (MortonND)    MortonND: BMI2
    807.765 ms  3386.069 ms : 32-bit (MortonND)    LUT: 3 chunks, 10 bit LUT
    870.854 ms  3479.404 ms : 32-bit (lib-morton)  LUT Shifted
    902.272 ms  3516.063 ms : 32-bit (MortonND)    LUT: 4 chunks, 8 bit LUT
    1033.880 ms 3605.081 ms : 32-bit (lib-morton)  LUT
    1162.954 ms 3755.352 ms : 32-bit (MortonND)    LUT: 6 chunks, 5 bit LUT

64-bit

++ Running each performance test 5 times and averaging results
++ Encoding 512^3 morton codes (134217728 in total)

    Linear      Random
    ======      ======
    524.044 ms  563.434 ms : 64-bit (lib-morton)  BMI2 instruction set
    536.608 ms  587.632 ms : 64-bit (MortonND)    BMI2
    632.564 ms  639.866 ms : 64-bit (MortonND)    LUT: 2 chunks, 11 bit LUT
    812.262 ms  817.026 ms : 64-bit (MortonND)    LUT: 3 chunks, 8 bit LUT
    812.683 ms  823.962 ms : 64-bit (MortonND)    LUT: 3 chunks, 7 bit LUT
    828.591 ms  845.098 ms : 64-bit (lib-morton)  LUT Shifted
    612.973 ms  863.919 ms : 64-bit (MortonND)    LUT: 2 chunks, 16 bit LUT
    909.336 ms  929.829 ms : 64-bit (lib-morton)  LUT
    516.114 ms 1003.164 ms : 64-bit (MortonND)    LUT: 1 chunks, 21 bit LUT
    
++ Decoding 512^3 morton codes (134217728 in total)

    Linear      Random
    ======      ======
    575.827 ms  3195.522 ms : 64-bit (MortonND)    BMI2
    560.200 ms  3204.435 ms : 64-bit (lib-morton)  BMI2 Instruction set
    932.061 ms  3600.856 ms : 64-bit (MortonND)    LUT: 4 chunks, 16 bit LUT
    1164.335 ms 3758.155 ms : 64-bit (MortonND)    LUT: 6 chunks, 11 bit LUT
    1300.058 ms 3912.560 ms : 64-bit (lib-morton)  LUT Shifted
    1518.749 ms 4144.474 ms : 64-bit (MortonND)    LUT: 9 chunks, 7 bit LUT
    1387.119 ms 4039.352 ms : 64-bit (MortonND)    LUT: 8 chunks, 8 bit LUT
    1564.913 ms 4150.859 ms : 64-bit (lib-morton)  LUT
    833.821 ms  4496.993 ms : 64-bit (MortonND)    LUT: 3 chunks, 21 bit LUT

Installation

CMake

Morton ND provides a CMake integration, making it easy to use from your CMake project.

Adding the dependency

If you've installed Morton ND to your CMAKE_MODULE_PATH (e.g. with vcpkg), find and link it like this:

find_package(morton-nd CONFIG REQUIRED)
target_link_libraries(main PRIVATE morton-nd::MortonND)

Otherwise, if you're using Morton ND as a Git submodule:

add_subdirectory(morton-nd)
target_link_libraries(main PRIVATE morton-nd::MortonND)

Including library headers

By using target_link_libraries(...) as above, Morton ND's headers will be automatically available for use by your target:

// main.cpp
#include <morton-nd/mortonND_BMI2.h>
#include <morton-nd/mortonND_LUT.h>

Thanks

Jeroen Baert (@Forceflow)
- Morton encoding/decoding through bit interleaving: Implementations
- libmorton, a C++11 header-only library for 2D and 3D Morton encoding and decoding.

License

This project is licensed under the MIT license.

Attribution is appreciated where applicable, and as such, a NOTICE file is included which may be distributed in the credits of derivative software.