Home

Awesome

Architect.Identities

Reliable unique ID generation for distributed applications.

This package provides highly tuned tools for ID generation and management.

TLDR

The DistributedId is a single ID that combines the advantages of auto-increment IDs and UUIDs.

The DistributedId128 is a 128-bit UUID replacement with the advantages of the DistributedId and practically no rate limits or collisions, at the cost of more space.

For sensitive scenarios where zero metadata must be leaked from an ID, Public Identities can transform any ID into a public representation that reveals nothing, without ever introducing an unrelated secondary ID.

Introduction

Should entity IDs use UUIDs or auto-increment?

Auto-increment IDs are ill-suited for exposing publically: they leak hints about the row count and are easy to guess. Moreover, they are generated very late, on insertion, posing challenges to the creation of aggregates.

UUIDs, on the other hand, tend to be random, causing poor performance as database/storage keys.

Using both types of ID on an entity is cumbersome and may leak a technical workaround into the domain model.

Luckily, we can do better.

Distributed IDs

The DistributedId is a UUID replacement that is generated on-the-fly (without orchestration), unique, hard to guess, easy to store and sort, and highly efficient as a database key.

A DistributedId is created as a 93-bit decimal value of 28 digits, but can also be represented as a (case-sensitive) 16-char alphanumeric value or as a Guid.

Distributed applications can create unique DistributedIds with no synchronization mechanism between them. This holds true under almost any load. Even under extreme conditions, collisions (i.e. duplicates) tend to be far under 1 collision per 350 billion IDs generated.

DistributedIds are designed to be unique within a logical context, such as a database table, a Bounded Context, or even a whole company. These form the most common boundaries within which uniqueness is required. Any number of distributed applications may generate new IDs within such a context.

Note that a DistributedId reveals its creation timestamp, which may be considered sensitive data in certain contexts.

Example Value

Example Usage

decimal id = DistributedId.CreateId(); // 1088824355131185736905670087

// Alternatively, for a more compact representation, IDs can be encoded in alphanumeric
string compactId = id.ToAlphanumeric(); // "3zfAkCP7ZtzfeQYp"
decimal originalId = AlphanumericIdEncoder.DecodeDecimalOrDefault(compactId)
	?? throw new ArgumentException("Not a valid encoded ID.");

For SQL databases, the recommended column type is DECIMAL(28, 0). Alternatively, a DistributedId can be stored as 16 case-sensitive ASCII characters, or even as a UUID. (The latter is discouraged, as storage engines differ in how they sort UUIDs.)

The ID generation can be controlled from the outside, such as in unit tests that require constant IDs:

[Fact]
public void ShowInversionOfControl()
{
	// A custom generator is included in the package
	const decimal fixedId = 1m;
	using (new DistributedIdGeneratorScope(new CustomDistributedIdGenerator(() => fixedId)))
	{
		var entity = new Entity(); // Constructor implementation uses DistributedId.CreateId()
		Assert.Equal(fixedId, entity.Id); // True
		
		// A simple incremental generator is included as well
		using (new DistributedIdGeneratorScope(new IncrementalDistributedIdGenerator(fixedId)))
		{
			Assert.Equal(1m, DistributedId.CreateId()); // True
			Assert.Equal(2m, DistributedId.CreateId()); // True
			Assert.Equal(3m, DistributedId.CreateId()); // True
		}
		
		Assert.Equal(fixedId, DistributedId.CreateId()); // True
	}
}

Benefits

Trade-offs

Structure

Rate Limits

Per application replica, the maximum sustained ID generation rate is roughly 128 IDs per millisecond, or 128K per second. The rate limit makes it possible to have incremental IDs even intra-millisecond, without sacrificing the other benefits.

To reduce the impact of the rate limit, each replica can burst generate up to 128K IDs instantly. During reduced activity, consumed burst capacity is regained according to the unused portion of the normal maximum rate. For example, after one second of not generating any IDs, the burst capacity is back up to its full 128K IDs. The same is true after two seconds of generating at half capacity.

Note that, in practice, cloud applications tend to scale out rather than up. Few applications require any single replica to generate over 128K IDs per second.

Collision Resistance

DistributedIds have strong collision resistance. The probability of generating the same ID twice is neglible for almost all contexts.

Most notably, collisions across different timestamps are impossible, since the millisecond values differ.

Within a single application replica, collisions during a particular millisecond are avoided (while maintaining the incremental nature) by reusing the previous random value (48 bits) and incrementing it by a smaller random value (41 bits). This guarantees unique IDs within the application replica, provided that the system clock is not adjusted backwards by more than 1 second. For larger backwards adjustments, the scenario is comparable to having an additional replica (addressed below) during the repeated time span.

The scenario where collisions can occur is when multiple application replicas are generating IDs at the same millisecond. It is detailed below and should be negligible.

The degenerate worst case

The chances of a collision occurring have been measured. Under the worst possible circumstances, they are as follows:

It is important to note that the above is only in the degenerate scenario where all replicas are generating IDs at the maximum rate per millisecond, and always on the exact same millisecond. In practice, far fewer IDs tend to be generated per millisecond, thus spreading IDs out over more timestamps. This significantly reduces the realistic probability of a collision, to 1 per many trillions, a negligible number for the intended purposes.

Absolute Certainty

Luckily, we can protect ourselves even against the extremely unlikely event of a collision.

For contexts where even a single collision could be catastrophic, such as in certain financial domains, it is advisable to avoid "upserts", and always explicitly separate inserts from updates. This way, even if a collision did occur, it would merely cause one single transaction to fail (out of billions or trillions), rather than overwriting an existing record. This is good practice in general.

Alternatively, the DistributedId128 offers far greater collision resistance at the cost of an unwieldy format.

Guessability

Presupposing knowledge of the millisecond timestamp on which an ID was generated, the probability of guessing that ID is between 1/2^41 and 1/2^48, thanks to the 48-bit cryptographically-secure pseudorandom sequence. In practice, the timestamp component tends to reduce the guessability, since for most milliseconds no IDs at will will have been generated.

The difference between the two probabilities (given knowledge of the timestamp) stems from the way the incremental property is achieved. If only one ID was generated on a timestamp, as tends to be common, the probability is 1/2^48. If the maximum number of IDs were generated on that timestamp, or if another ID from the same timestamp is known, an educated guess has a 1/2^41 probability of being correct.

To reduce the guessability to 1/2^128, see Public Identities.

Attack Surface

A DistributedId reveals its creation timestamp. Otherwise, it consists of cryptographically-secure pseudorandom data.

Entity Framework

When DistributedIds (or any decimal IDs) are used in Entity Framework, the column type needs to be configured. Although this can be done manually, the Architect.Identities.EntityFramework package facilitates conventions for this through its extension methods.

protected override void ConfigureConventions(ModelConfigurationBuilder configurationBuilder)
{
	base.ConfigureConventions(configurationBuilder);

	configurationBuilder.ConfigureDecimalIdTypes(modelAssemblies: typeof(SomeEntity).Assembly);
}

ConfigureDecimalIdTypes() uses precision 28, scale 0, and conversions to and from the decimal type (where necessary).

The conventions are applied to any entity properties named "*Id" or "*ID" whose type is either decimal or a decimal-convertible type, including nullable wrappers.

Optionally, the extension method takes any number of assemblies as input. From those assemblies, it finds all types named "*Id" or "*ID" that are decimal-convertible, and configures a DefaultTypeMapping for them using the same conventions.

A DefaultTypeMapping kicks in when the type appears in EF-generated queries where the context of a column is lost, such as when EF generates a call to CAST(). Without such a mapping, EF may choose to convert a decimal to some default precision, which is generally too low.

Alternatives

There exist various alternatives to the DistributedId. The most common feature they lack is the combination of being incremental and unpredictable, particularly for IDs generated on the same timestamp.

To highlight a few examples:

DistributedId128

The DistributedId128 is a 128-bit DistributedId variant that offers additional benefits at the cost of extra space and being more unwieldy than a simple decimal. It should be used if the requirements on generation rate or collision resistance are extreme (very high volumes) or unpredictable (class libraries). Apart from leaking the generation timestamp, this ID can serve as a drop-in replacement for the common version-4 random UUID.

Class libraries are a good example of products that should prefer the DistributedId128. They can make fewer assumptions, as their usage patterns often depend on the applications using them.

Additional Advantages

128-bit Structure

Public Identities

Sometimes, revealing even a creation timestamp is too much. For example, an ID might represent a bank account.

Still, it is desirable to have only a single ID, and one that is efficient as a primary key, at that. To achieve that, we can create a public representation of that ID, one that reveals nothing.

Example Value

Example Usage

// Startup.cs
public void ConfigureServices(IServiceCollection services)
{
	const string key = "k7fBDJcQam02hsByaOWPeP2CqeDeGXvrPUkEAQBtAFc="; // Strong 256-bit key, encoded in base-64
	
	services.AddPublicIdentities(publicIdentities => publicIdentities.Key(key));
}
public void ExampleUse(IPublicIdentityConverter publicIdConverter)
{
	decimal id = DistributedId.CreateId(); // 1088824355131185736905670087
	
	// Convert to public ID
	Guid publicId = publicIdConverter.GetPublicRepresentation(id); // 30322474-a954-ffa9-941c-6f038afe4ff1
	string publicIdString = publicId.ToAlphanumeric(); // "48XoooHHCe1CiOHrghM7Dl" (22 chars)
	
	// Convert back to internal ID
	decimal originalId = publicIdConverter.GetDecimalOrDefault(AlphanumericIdEncoder.DecodeGuidOrDefault(publicIdString) ?? Guid.Empty)
		?? throw new ArgumentException("Not a valid ID.");
}
public void ExampleHexadecimalEncoding(IPublicIdentityConverter publicIdConverter)
{
	decimal id = DistributedId.CreateId(); // 1088824355131185736905670087
	
	// We can use Guid's own methods to get a hexadecimal representation
	Guid publicId = publicIdConverter.GetPublicRepresentation(id); // 30322474-a954-ffa9-941c-6f038afe4ff1
	string publicIdString = publicId.ToString("N").ToUpperInvariant(); // "32F0EDAC80632C685C43C889B058556E" (32 chars)
	
	// Convert back to internal ID
	decimal originalId = publicIdConverter.GetDecimalOrDefault(new Guid(publicIdString))
		?? throw new ArgumentException("Not a valid ID.");
}

Implementation

Public identities are implemented by using AES encryption under a secret key. With the key, the original ID can be retrieved. Without the key, the data is indistinguishable from random noise. In fact, it is exactly the size of a UUID, and it can be formatted to look just like one, for that familiar feel.

Obviously, it is important that the secret key is kept safe. Moreover, the key must not be changed. Doing so would render any previously provided public identities invalid.

Forgery Resistance

Without possession of the key, it is extremely hard to forge a valid public identity.

When a public identity is converted back into the original ID, its structure is validated. If it is invalid, null or false is returned, depending on the method used.

For long and ulong IDs, the chance to forge some valid ID is 1/2^64. For decimal IDs, the chance is 1/2^32. Most importantly, even if a valid value were to be forged, the resulting internal ID would be a random one and would be extremely unlikely to match an existing one. This property makes even Guid and UInt128 IDs usable, despite the fact that they do not have spare space to detect forgeries.

Generally, when an ID is taken as client input, something is loaded based on that ID. As such, it is often best to simply turn an invalid public identity into a nonexistent local ID, such as 0:

// Any invalid input will result in id=0
long id = publicIdConverter.GetLongOrDefault(AlphanumericIdEncoder.DecodeGuidOrDefault(publicIdString) ?? Guid.Empty)
	?? 0L;

var entity = this.Repository.GetEntityById(id);
if (entity is null) return this.NotFound();