Home

Awesome

VarU64

A simple variable-length encoding for unsigned 64 bit integers.

Specification

To decode a VarU64, look at the first byte. If its value is below 248, the value itself is the encoded number. Else, the first byte determines the further length of the encoding:

first bytenumber of additional bytes
2481
2492
2503
2514
2525
2536
2547
2558

Following the first byte are length many bytes. These bytes are the big-endian representation of the encoded number.

Canonical VarU64

This specification allows different ways to encode the same number, e.g. 0x00 and 0xf800 both decode to zero. To obtain a canonical encoding, we limit those possibilities: Of all possible representations for a number that this scheme admits, the shortest one is its unique, canonical encoding. Canonical decoders must indicate an error if a value uses an encoding that is longer than necessary.

Variable-Length Encodings of Other Fixed-Size Integers

Analogously to this specification, you can encode signed integers by using big-endian two's complement representation.

The specification can also be adapted to other sizes of integers (up to 2040 bit inclusive), by starting the length table at a different value. For example, the VarI32 encoding for signed 32-bit integers is defined as follows:

To decode a VarI32, look at the first byte. If its (unsigned) value is below 252, the value itself is the encoded number in two's complement. Else, the first byte determines the further length of the encoding:

first byte (unsigned)number of additional bytes
2521
2532
2543
2554

Following the first byte are length many bytes. These bytes are the big-endian, two's complement representation of the encoded number.

Non-Zero Unsigned Integers

The VarNonZeroU64 encoding for nonzero 64-bit integers consists of subtracting one from the integer and encoding the resulting number as a regular VarU64. Unsigned integers of other size work analogously.

Greater-Than X Unsigned Integers

Non-zero unsigned integers can be generalized to unsigned integers strictly greater than some number X: The VarGtXU64 encoding for 64-bit integer strictly great then X consists of subtracting X from the integer and encoding the resulting number as a regular VarU64. Unsigned integers of other size work analogously.

Remarks

This has been inspired by the issues in the multiformats varint repository, in particular issues #8 and #12.