Awesome
ID Encryptor
Encrypts and optionally serializes to and from JSON (using Jackson annotations) long and UUID fields.
Sequential ID
A commonly used, and very practical, entity surrogate identifier is a sequential number called ID generally assigned by the database itself. It is very efficient on a single system but suffers the problem of not being unique in a distributed environment. One solution is to use very large random numbers (with a very small collision probability) or to generate them embedding a node-id and a timestamp so they are unique among the distributed system. In most cases the unicity constraint can be restricted to the context of the distributed application or to a specific entity in the application.
UUID
An UUID, Universally Unique IDentifier, is basically a random number of almost 128 bit (some bits are reserved) big enough so the probability of two values being created equal is extremely low (version 4). Unfortunately the same randomness that makes them unique behave very inefficiently with the way most databases organize their indexes, the B-Tree, leading to horrible performances (in both speed and used space). To overcome this pitfall the random version 4 UUID should be substituted by sequential UUID versions (such as new proposed versions 5,6,7), which generates their values in a monotonic sequence of 128 bit numbers (unfortunately version 1 is not really sequential even though it uses node-id and timestamp). Sequential UUIDs work well with database indexes but might embed some info that could leak unwanted internal details. An UUID is also double the size of a long ID which can impact the performances and the database size by being replied on indexes, caches, foreign keys, views etc.
TSID
TSID, Time Sorted ID, is a smaller version of UUID that is only 64 bits long. It has the same features of a full blown UUID with some constraints due to it having far less bits available. TSID can be encoded in a single long value and can be generated in a node and time dependent sequences (again to fix the indexing problem). Being encoded as long, for the point of view of an application, a TSID is just like any other sequential ID (just not starting from 0 and not auto-generated by the database, which might be an advantage). Note that a TSID encoded long doesn't fit into the maximum number that javascript can accept as integer and must be exported as a string if consumed by Javascript. Having much less real estate than UUID a TSID sequences can be guessed much more easily and that could be exploited by an attacker (it also embeds node id and creation time).
TSID security can be improved by encryption to deny an attacker access to both embedded info and sequential values. Encryption should happen at the boundaries of the application (i.e. REST API) because saving encrypted value on the database would trigger the indexing problem again (an encrypted value is basically random).
This project doesn't refer directly to TSID for generality (because they are effectively just longs) but offers plenty of ways to encrypt and serialize long values.
Encrypting Long ID
If we are all right with sequential long IDs internally but are worry of exporting them in the API we can encrypt them (optionally embedding a seed) to make the next sequence value hard to guess. We have the option of using the full 64 bit value, necessarily exported as string, or a 52 bit version compatible with javascript and so exported as a numeric value. Note that encrypted values (both strings and numbers) are scrambled in a way that ordering by ID makes no sense anymore.
Exporting Long ID as a UUID
It is also possible, and supported, to create a UUID
combining the long ID with a fixed nodeId
long (and optionally a fieldId
if it is important to have different UUIDs for the same ID on two different entities). Because the resulted UUID would not be robust against guessing attack it should be encrypted (there are methods provided). Note that an encrypted UUID would not be valid not respecting reserved type and variant bits.
Converting long to UUID is useful to adapt an application not designed to work in a distributed environment (i.e. to export data to a centralized application by several clients). It also makes an interesting option to keep the best performances from the local database while still be able to communicate in a distributed environment.
Comparison
Choosing the right type of identifier creates a tension between a lot of different trade-offs but it helps noticing that not all data needs to be treated in the same way and there can be different identifiers in the same application. For example using an universal identifier for the Invoice and a sequential long ID for items that will always be exported together with the invoice and never addressed individually so exporting their identifier doesn't really make sense.
Using UUIDs is expensive in terms of efficiency and complexity but might be the right choice for data that needs to be referred by many different systems. A TSID encoded long can be a valid, and more efficient, substitute for almost all cases except when the universality constraint is to be intended in its wider sense (twitter uses an identifier similar to TSID for its tweets).
In the presence of a distributed system, with many databases involved, using an universal identifier would make merging, data migration and database management in general much easier (your DBA would be grateful). The TSID is small and fast but if you really need a full UUID always prefer the new UUID formats.
The sequential long ID is by far the most used, efficient and easier alternative. It can be encrypted in various ways to protect against the guessing value attack and can be exported as either a string, a scrambled number of 52 bits and even as an UUID. For single applications is the obvious choice but the tricks that makes it adaptable to a distributed system don't really solve the problem were it is more crucial: at the database level. So it works as long as data are supposed to stay segregated (no shared backup server, logging, migrations...).
In conclusion it seems that TSID really represents the better option for most cases at least in distributed systems. Its only problems are that it's not universally unique, but only in the context of the application, and that if too few bits are allocated to the counter it is much more guessable than a full blown UUID (but it can be encrypted).
Encryption
The proposed encryption algorithms are Blowfish for long (64 bit) and AES for UUID (128 bit) used with the ECB mode. ECB mode is discouraged for encrypting multi block messages but it's perfectly adequate if the data to encrypt matches the size of one single cipher block.
Cache
All operations of encryption, decryption, serialization and deserialization are memoized (cached) to improve efficiency and speed. The cache can be accessed concurrently and uses weak references so that cached memory could be reclaimed by the GC.
Jackson serialization
The ID encryption should happen at the API boundaries and the serialization of data into JSON strings is a perfect place to encode them (away from the service layer that should know nothing about it). An added benefit is that all the annotated indexes will be automatically translated back and forth (and longs will be converted into strings as needed). The project provides serialization and deserialization helpers for both longs and UUIDs when used as single fields, into lists or as keys in a map. An optional fieldId parameter is provided in the annotation for encrypting longs (it's ignored for UUIDs) to allow a further scrambling so that equal values will not be encrypted in the same string (the same password is shared along all encryptable fields in the project).
There are annotations (those with LongAsUuid
in the name) to transform a long ID into an UUID. This will require to add an optional nodeId
in the factory and an optional fieldId
in the annotation to distinguish the UUID values of the same ID on different entities where this might be needed (so a Client with ID=2 will not have the same UUID of an Invoice with ID=2 which might be problematic for security reasons).
This is an example of a class to serialize with annotate fields, it should be quite self-explanatory:
public static class Bean {
// serialized as an encrypted string
@Encryptable(type = ExportType.String)
Long encryptableLongValue = LONG_VALUE_1;
// uses a nodeId=2 to encrypt to a different string than the previous
@Encryptable(type = ExportType.String, nodeId = 2)
Long encryptableLongValue2 = LONG_VALUE_1;
// uses a nodeId=3 to encrypt to a different string than the previous
@Encryptable(type = ExportType.String, nodeId = 3)
Long encryptableLongValue3 = LONG_VALUE_1;
// encrypts to a long number of 52 bit usable by javascript
@Encryptable(type = ExportType.Long52Bit)
Long encryptableLong52Value = LONG_VALUE_1;
// encrypts to a UUID created from the long id
// (optionally you can set a nodeId)
@Encryptable(type = ExportType.LongAsUuid)
Long encryptableLongAsUuidValue = LONG_VALUE_1;
Long nonEncryptableLongValue = LONG_VALUE_1;
@EncryptableCollection(type = ExportType.String)
List<Long> encryptableLongList = LONG_LIST;
@EncryptableCollection(type = ExportType.String, nodeId = 2)
List<Long> encryptableLongList2 = LONG_LIST;
@EncryptableCollection(type = ExportType.Long52Bit)
List<Long> encryptableLong52List = LONG_LIST;
@EncryptableCollection(type = ExportType.Long52Bit)
List<Long> encryptableLongAsUuidList = LONG_LIST;
List<Long> nonEncryptableLongList = LONG_LIST;
@EncryptableKey(type = ExportType.String)
Map<Long, String> encryptableLongMap = LONG_MAP;
@EncryptableKey(type = ExportType.String, nodeId = 2)
Map<Long, String> encryptableLongMap2 = LONG_MAP;
@EncryptableKey(type = ExportType.Long52Bit)
Map<Long, String> encryptableLong52Map = LONG_MAP;
@EncryptableKey(type = ExportType.LongAsUuid)
Map<Long, String> encryptableLongAsUuidMap = LONG_MAP;
Map<Long, String> nonEncryptableLongMap = LONG_MAP;
@Encryptable(type = ExportType.Uuid)
UUID encryptableUuidValue = UUID_VALUE_1;
UUID nonEncryptableUuidValue = UUID_VALUE_2;
@EncryptableCollection(type = ExportType.Uuid)
List<UUID> encryptableUuidList = UUID_LIST;
List<UUID> nonEncryptableUuidList = UUID_LIST;
@EncryptableKey(type = ExportType.Uuid)
Map<UUID, String> encryptableUuidMap = UUID_MAP;
Map<UUID, String> nonEncryptableUuidMap = UUID_MAP;
}
Of course ID parameters passed on the URL should be converted manually.
@GetMapping("/invoices/{customerId}")
public List<Invoice> getInvoices(@PathVariable String customerId) {
UUID userId = EncryptorsHolder.decryptLong(customerId);
return accountinService.getInvoicesOfCustomer(userId);
}
Don't forget to initialize the EncryptorsHolder
: there are several init methods available with different options.
EncryptorsHolder.initEncryptorsWithPassword("abracadabra");
References
Refer to this blog post for further details.