Awesome
SCRU64: Sortable, Clock-based, Realm-specifically Unique identifier
SCRU64 ID offers compact, time-ordered unique identifiers generated by distributed nodes. SCRU64 has the following features:
- 63-bit non-negative integer storable as signed/unsigned 64-bit integer
- Sortable by generation time (as integer and as text)
- 12-digit case-insensitive textual representation (Base36)
- ~38-bit Unix epoch-based timestamp that ensures useful life until year 4261
- Variable-length node/machine ID and counter fields that share 24 bits
Examples in the 12-digit canonical textual representation:
0u375nxqh5cq
0u375nxqh5cr
0u375nxqh5cs
0u375nxqh5ct
0u375ny0glr0
0u375ny0glr1
0u375ny0glr2
0u375ny0glr3
SCRU64's uniqueness is realm-specific, i.e., dependent on the centralized assignment of node ID to each generator. If you need decentralized, globally unique time-ordered identifiers, consider SCRU128. See the following comparison table for the properties of the two schemes.
SCRU64 | SCRU128 | |
---|---|---|
Long name | Sortable, Clock-based, Realm-specifically Unique identifier | Sortable, Clock and Random number-based Unique identifier |
Binary size | 63 bits | 128 bits |
Textual size | 12 digits | 25 digits |
Textual encoding | Base36 with 0-9a-z (case-insensitive) | Base36 with 0-9a-z (case-insensitive) |
Timestamp range | January 1, 1970 - February 27, 4261 (UTC) | January 1, 1970 - August 2, 10889 (UTC) |
Timestamp resolution | 256 milliseconds | 1 millisecond |
Number of IDs per timestamp | Up to 8.4 million per 256 milliseconds per node (configurable) | 140.7 trillion per millisecond per node (on average) |
Number of distributed nodes | Up to 8.4 million (configurable) | No specific limitation |
Source of uniqueness | Centrally pre-assigned node ID | Independently generated random numbers |
Choose it when you ... | Prefer 64-bit integer for storage, indexing, and other reasons | Want unique IDs without hassle to coordinate generators |
Implementations
Specification v1.0.0
A SCRU64 ID is a non-negative integer less than 36^12
(approx. 2^62
)
consisting of three terms:
timestamp * 2^24 + node_id * 2^(24 - node_id_size) + counter
Where:
timestamp
is a non-negative integer less than36^12 / 2^24
(from zero to282429536480
) representing a 256-millisecond-precision Unix timestamp (i.e., the number of 256 milliseconds elapsed since 1970-01-01 00:00:00+00:00, ignoring leap seconds; or a Unix timestamp in milliseconds divided by 256).node_id
is anode_id_size
-bit unsigned integer uniquely assigned to each SCRU64 generator in a relevant scope, wherenode_id_size
is an integer between 1 and 23, inclusive.- The method to assign unique
node_id
s is implementation-dependent and is out of the scope of this specification. node_id_size
may be chosen arbitrarily and may vary from node to node as long as the leading bits of a longernode_id
do not collide with shorternode_id
s.
- The method to assign unique
counter
is a24 - node_id_size
-bit unsigned integer incremented by one whenever a generator produces a new ID.counter
is reset to a random number whentimestamp
moves forward.counter
should be reset to a random number of the fullcounter
size but may be reset to a smaller-bit (e.g., 15-bit for a 16-bitcounter
) random number to reserve the leading bits as an overflow guard to guarantee the room for a certain number of IDs within atimestamp
tick.
This definition is equivalent to the following bitwise operations using 64-bit integers:
timestamp << 24 | node_id << (24 - node_id_size) | counter
For the following value ranges:
timestamp = unix_timestamp_in_milliseconds >> 8
0 <= timestamp <= 282429536480
1 <= node_id_size <= 23
0 <= node_id < 1 << node_id_size
0 <= counter < 1 << (24 - node_id_size)
Binary representation
A SCRU64 ID is usually represented as a 64-bit unsigned integer but may be
expressed as a signed integer, a byte array, or any other form as long as it
encodes integers from zero to the maximum value (36^12 - 1
).
Textual representation
A SCRU64 ID is encoded in a string using the Base36 encoding. The Base36
denotes a SCRU64 ID as an unsigned integer in the radix of 36 using the digits
of 0-9a-z
(0123456789abcdefghijklmnopqrstuvwxyz
), with leading zeros added
to form a 12-digit canonical representation. The following pseudo equation
illustrates the encoding algorithm:
109959589539758421
= 0 * 36^11 + 30 * 36^10 + 2 * 36^9 + ... + 4 * 36^2 + 11 * 36^1 + 9
= '0' * 36^11 + 'u' * 36^10 + '2' * 36^9 + ... + '4' * 36^2 + 'b' * 36^1 + '9'
= "0u2pf62ji4b9"
The maximum value (36^12 - 1
) of SCRU64 ID is defined as the maximum value
denotable in a 12-digit Base36 string (i.e., zzzzzzzzzzzz
).
For the sake of uniformity, an encoder should use lowercase letters in encoding IDs. A decoder, on the other hand, must always ignore cases when interpreting or lexicographically sorting encoded IDs.
The Base36 encoding shown above is available by default in several languages
(e.g., strconv.FormatUint()
and strconv.ParseUint()
in Go). Another simple
way to implement it is by using 64-bit integer division and modulo operations.
The following C code illustrates the straightforward algorithm:
uint64_t num = UINT64_C(109959589539758421);
static const char digits[] = "0123456789abcdefghijklmnopqrstuvwxyz";
char text[13];
for (int i = 11; i >= 0; i--) {
text[i] = digits[num % 36];
num /= 36;
}
text[12] = '\0';
puts(text); // 0u2pf62ji4b9
Special-purpose IDs
The IDs with timestamp
set at zero or 36^12 / 2^24 - 1
are reserved for
special purposes (e.g., use as dummy, error, or example values) and must not be
used or assigned as an identifier of anything.
Considerations
Quality of random numbers
The random numbers used need not be of cryptographic quality because small
random numbers are insecure anyway. This specification introduces randomness not
as a source of uniqueness or unguessability but primarily as a thin protection
against unintended duplication of node_id
s by accidents and mistakes.
Identifying node by node_id
Implementations should not extract the node_id
from a SCRU64 ID to identify
the generator because:
node_id
is not necessarily a persistent ID of a single node and may change over time.node_id_size
may also change over time to adjust the trade-off between the number of distributed nodes and the number of IDs per node.
node_id
is embedded in an ID solely for the purpose of uniqueness guarantee
across distributed nodes, and thus the use of node_id
for any other reason may
harm the extensibility of the variable node_id_size
design.
Counter overflow handling
Counter overflow occurs when the counter
field does not provide sufficient
space for the IDs generated within a timestamp
tick. The counter
of SCRU64
IDs tends to overflow when the workload is high because the scheme is only able
to spare a limited number of bits for counter
. Therefore, generators must
implement reasonable logic to handle such overflows. The recommended approach is
to increment timestamp
and continue in the following way:
- Increment
timestamp
by one. - Reset
counter
to a random number.
This approach is recommended over other options such as the "sleep till next tick" approach because this technique allows the generation of monotonically ordered IDs in a non-blocking manner. Raising an error on a counter overflow is generally not recommended because a counter overflow is not a fault of users of SCRU64.
This approach results in a greater timestamp
value than the real-time clock.
Such a gap between timestamp
and the wall clock should be handled as a small
clock rollback discussed below.
Clock rollback handling
A SCRU64 generator relies on a real-time clock to ensure the monotonic order of generated IDs; therefore, it cannot guarantee monotonicity when the clock moves back. When a generator detects a clock rollback by comparing the up-to-date timestamp from the system clock and the one embedded in the last generated ID, the recommended treatment is:
- If the rollback is small enough (e.g., a few seconds), treat the
timestamp
of the last generated ID as the up-to-date one, betting that the wall clock will catch up soon. - Otherwise, stall the generator and wait for the next
timestamp
tick, reset the generator with another uniquenode_id
, or abort and raise an error.
This approach ensures a prompt response when a clock rollback is small, while
the generator behavior will be implementation-dependent if the clock rollback is
significant, or if the demand for IDs is so high that the counter overflow
handling discussed above results in a timestamp
significantly advanced from
the current timestamp.
Resetting timestamp
without refreshing node_id
is highly discouraged because
it results in a very high risk of duplicate IDs.
Informative usage notes
Node spec API
The reference implementations include a default global generator that reads the
node_id
and node_id_size
configuration from an environment variable. Pass a
unique node configuration encoded in a node specifier string through the
SCRU64_NODE_SPEC
environment variable to invoke a process, and the invoked
application will have access to the global SCRU64 generator configured with the
given node information. For example:
SCRU64_NODE_SPEC=42/8 npx scru64 -n 4
A node spec string starts with a decimal node_id
, a hexadecimal node_id
prefixed by "0x", or a 12-digit node_prev
SCRU64 ID value, followed by a slash
and a decimal node_id_size
value ranging from 1 to 23 (e.g., "42/8",
"0xb00/12", "0u2r85hm2pt3/16"). The first and second forms create a fresh new
generator with the given node_id
, while the third form constructs one that
generates subsequent SCRU64 IDs to the node_prev
.
The global generator raises an error if a proper node spec is not passed, because a SCRU64 generator is not able to generate a unique ID without knowledge of a unique node configuration.
Variable node_id_size
use cases
SCRU64 employs the variable node_id_size
design to pursue the right balance
between the number of distributed nodes and the number of IDs generated by each
node within a 256-millisecond timestamp
tick. If an application consists of
many independent nodes that generate a few IDs per timestamp
, a large
node_id_size
will fit. If an application deploys a few nodes that generate
many IDs, then a small node_id_size
should be used.
With the variable-size design, an application can operate short node_id
s and
long node_id
s together if the leading bits of longer node_id
s are carefully
assigned not to collide with short node_id
s. This approach is useful to mix
hot nodes that generate many and cold nodes that do not. For example:
# Hot nodes take 0b0000, 0b0001, ..., 0b0111
SCRU64_NODE_SPEC=0x0/4 launch_hot_node
SCRU64_NODE_SPEC=0x1/4 launch_hot_node
# Cold nodes use 0b1000_0000, 0b1000_0001, ..., 0b1111_1111
SCRU64_NODE_SPEC=0x80/8 launch_cold_node
SCRU64_NODE_SPEC=0x81/8 launch_cold_node
SCRU64_NODE_SPEC=0x82/8 launch_cold_node
SCRU64_NODE_SPEC=0x83/8 launch_cold_node
Node configurations may vary from time to time. Theoretically, anode_id
must
be unique in a single timestamp
tick only, so two generators sharing one
node_id
will not collide with each other as long as they operate in totally
different timestamp
spaces. A typical use case of this property is to expire
all the node_id
s at the end of a day and lease new ones that are adaptively
customized based on the stats of yesterday.
License
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.