Home

Awesome

SCRU64: Sortable, Clock-based, Realm-specifically Unique identifier

SCRU64 ID offers compact, time-ordered unique identifiers generated by distributed nodes. SCRU64 has the following features:

Examples in the 12-digit canonical textual representation:

0u375nxqh5cq
0u375nxqh5cr
0u375nxqh5cs
0u375nxqh5ct
0u375ny0glr0
0u375ny0glr1
0u375ny0glr2
0u375ny0glr3

SCRU64's uniqueness is realm-specific, i.e., dependent on the centralized assignment of node ID to each generator. If you need decentralized, globally unique time-ordered identifiers, consider SCRU128. See the following comparison table for the properties of the two schemes.

SCRU64SCRU128
Long nameSortable, Clock-based, Realm-specifically Unique identifierSortable, Clock and Random number-based Unique identifier
Binary size63 bits128 bits
Textual size12 digits25 digits
Textual encodingBase36 with 0-9a-z (case-insensitive)Base36 with 0-9a-z (case-insensitive)
Timestamp rangeJanuary 1, 1970 - February 27, 4261 (UTC)January 1, 1970 - August 2, 10889 (UTC)
Timestamp resolution256 milliseconds1 millisecond
Number of IDs per timestampUp to 8.4 million per 256 milliseconds per node (configurable)140.7 trillion per millisecond per node (on average)
Number of distributed nodesUp to 8.4 million (configurable)No specific limitation
Source of uniquenessCentrally pre-assigned node IDIndependently generated random numbers
Choose it when you ...Prefer 64-bit integer for storage, indexing, and other reasonsWant unique IDs without hassle to coordinate generators

Implementations

Specification v1.0.0

A SCRU64 ID is a non-negative integer less than 36^12 (approx. 2^62) consisting of three terms:

timestamp * 2^24 + node_id * 2^(24 - node_id_size) + counter

Where:

This definition is equivalent to the following bitwise operations using 64-bit integers:

timestamp << 24 | node_id << (24 - node_id_size) | counter

For the following value ranges:

timestamp = unix_timestamp_in_milliseconds >> 8
0 <= timestamp    <= 282429536480
1 <= node_id_size <= 23
0 <= node_id      <  1 << node_id_size
0 <= counter      <  1 << (24 - node_id_size)

Binary representation

A SCRU64 ID is usually represented as a 64-bit unsigned integer but may be expressed as a signed integer, a byte array, or any other form as long as it encodes integers from zero to the maximum value (36^12 - 1).

Textual representation

A SCRU64 ID is encoded in a string using the Base36 encoding. The Base36 denotes a SCRU64 ID as an unsigned integer in the radix of 36 using the digits of 0-9a-z (0123456789abcdefghijklmnopqrstuvwxyz), with leading zeros added to form a 12-digit canonical representation. The following pseudo equation illustrates the encoding algorithm:

109959589539758421
    =  0  * 36^11 + 30  * 36^10 +  2  * 36^9 + ... +  4  * 36^2 + 11  * 36^1 +  9
    = '0' * 36^11 + 'u' * 36^10 + '2' * 36^9 + ... + '4' * 36^2 + 'b' * 36^1 + '9'
    = "0u2pf62ji4b9"

The maximum value (36^12 - 1) of SCRU64 ID is defined as the maximum value denotable in a 12-digit Base36 string (i.e., zzzzzzzzzzzz).

For the sake of uniformity, an encoder should use lowercase letters in encoding IDs. A decoder, on the other hand, must always ignore cases when interpreting or lexicographically sorting encoded IDs.

The Base36 encoding shown above is available by default in several languages (e.g., strconv.FormatUint() and strconv.ParseUint() in Go). Another simple way to implement it is by using 64-bit integer division and modulo operations. The following C code illustrates the straightforward algorithm:

uint64_t num = UINT64_C(109959589539758421);

static const char digits[] = "0123456789abcdefghijklmnopqrstuvwxyz";
char text[13];
for (int i = 11; i >= 0; i--) {
  text[i] = digits[num % 36];
  num /= 36;
}
text[12] = '\0';
puts(text); // 0u2pf62ji4b9

Special-purpose IDs

The IDs with timestamp set at zero or 36^12 / 2^24 - 1 are reserved for special purposes (e.g., use as dummy, error, or example values) and must not be used or assigned as an identifier of anything.

Considerations

Quality of random numbers

The random numbers used need not be of cryptographic quality because small random numbers are insecure anyway. This specification introduces randomness not as a source of uniqueness or unguessability but primarily as a thin protection against unintended duplication of node_ids by accidents and mistakes.

Identifying node by node_id

Implementations should not extract the node_id from a SCRU64 ID to identify the generator because:

node_id is embedded in an ID solely for the purpose of uniqueness guarantee across distributed nodes, and thus the use of node_id for any other reason may harm the extensibility of the variable node_id_size design.

Counter overflow handling

Counter overflow occurs when the counter field does not provide sufficient space for the IDs generated within a timestamp tick. The counter of SCRU64 IDs tends to overflow when the workload is high because the scheme is only able to spare a limited number of bits for counter. Therefore, generators must implement reasonable logic to handle such overflows. The recommended approach is to increment timestamp and continue in the following way:

  1. Increment timestamp by one.
  2. Reset counter to a random number.

This approach is recommended over other options such as the "sleep till next tick" approach because this technique allows the generation of monotonically ordered IDs in a non-blocking manner. Raising an error on a counter overflow is generally not recommended because a counter overflow is not a fault of users of SCRU64.

This approach results in a greater timestamp value than the real-time clock. Such a gap between timestamp and the wall clock should be handled as a small clock rollback discussed below.

Clock rollback handling

A SCRU64 generator relies on a real-time clock to ensure the monotonic order of generated IDs; therefore, it cannot guarantee monotonicity when the clock moves back. When a generator detects a clock rollback by comparing the up-to-date timestamp from the system clock and the one embedded in the last generated ID, the recommended treatment is:

  1. If the rollback is small enough (e.g., a few seconds), treat the timestamp of the last generated ID as the up-to-date one, betting that the wall clock will catch up soon.
  2. Otherwise, stall the generator and wait for the next timestamp tick, reset the generator with another unique node_id, or abort and raise an error.

This approach ensures a prompt response when a clock rollback is small, while the generator behavior will be implementation-dependent if the clock rollback is significant, or if the demand for IDs is so high that the counter overflow handling discussed above results in a timestamp significantly advanced from the current timestamp.

Resetting timestamp without refreshing node_id is highly discouraged because it results in a very high risk of duplicate IDs.

Informative usage notes

Node spec API

The reference implementations include a default global generator that reads the node_id and node_id_size configuration from an environment variable. Pass a unique node configuration encoded in a node specifier string through the SCRU64_NODE_SPEC environment variable to invoke a process, and the invoked application will have access to the global SCRU64 generator configured with the given node information. For example:

SCRU64_NODE_SPEC=42/8 npx scru64 -n 4

A node spec string starts with a decimal node_id, a hexadecimal node_id prefixed by "0x", or a 12-digit node_prev SCRU64 ID value, followed by a slash and a decimal node_id_size value ranging from 1 to 23 (e.g., "42/8", "0xb00/12", "0u2r85hm2pt3/16"). The first and second forms create a fresh new generator with the given node_id, while the third form constructs one that generates subsequent SCRU64 IDs to the node_prev.

The global generator raises an error if a proper node spec is not passed, because a SCRU64 generator is not able to generate a unique ID without knowledge of a unique node configuration.

Variable node_id_size use cases

SCRU64 employs the variable node_id_size design to pursue the right balance between the number of distributed nodes and the number of IDs generated by each node within a 256-millisecond timestamp tick. If an application consists of many independent nodes that generate a few IDs per timestamp, a large node_id_size will fit. If an application deploys a few nodes that generate many IDs, then a small node_id_size should be used.

With the variable-size design, an application can operate short node_ids and long node_ids together if the leading bits of longer node_ids are carefully assigned not to collide with short node_ids. This approach is useful to mix hot nodes that generate many and cold nodes that do not. For example:

# Hot nodes take 0b0000, 0b0001, ..., 0b0111
SCRU64_NODE_SPEC=0x0/4 launch_hot_node
SCRU64_NODE_SPEC=0x1/4 launch_hot_node

# Cold nodes use 0b1000_0000, 0b1000_0001, ..., 0b1111_1111
SCRU64_NODE_SPEC=0x80/8 launch_cold_node
SCRU64_NODE_SPEC=0x81/8 launch_cold_node
SCRU64_NODE_SPEC=0x82/8 launch_cold_node
SCRU64_NODE_SPEC=0x83/8 launch_cold_node

Node configurations may vary from time to time. Theoretically, anode_id must be unique in a single timestamp tick only, so two generators sharing one node_id will not collide with each other as long as they operate in totally different timestamp spaces. A typical use case of this property is to expire all the node_ids at the end of a day and lease new ones that are adaptively customized based on the stats of yesterday.

License

This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.