This README is intended to describe the Host Unique Identifier in-general and provide a comparison of HUID to the various versions of UUID repository is intended to house reference implementations in various languages.
A Host Unique Identifier (HUID) is a unique indentifier, similar to a Universally Unique Identifier (UUID), with optimizations for use as an indexed value or primary key.
UUID and HUID are both object identifiers intended to reduce the likelihood of multiple objects or records beign assigned the same identifier. Both consist of 5 fields and can be represented as a 36 character hyphen-delimited string, a 32 character hexadecimal value, or a 16 byte binary value. Use cases for UUID and HUID are similar; one common use case is as a primary key to enable simultaneous inserts into a distributes data store (such as an SQL database with multiple masters) without risk of collision.
There is nothing wrong with UUID per-se, but it doesn't do as much as it could
to prevent collisions and, for reasons mentioned below, is not well suited for
se in indexed fields, sich as a primary key. If your application is collision-
tolerant and the UUID will not be used in an indexed field (or INSERT
performance is not important), UUID is a fine choice.
The list below details what's wrong with UUID when used in applications that are not collision-tolerant, or as a record ID that will be indexed, by version:
- UUID v1 and v2
- Because there is no random component, there is a high likelihood of collision is a high volume of UUIDs are generated
- Because the leftmost field is the most volatile, these UUIDs are loosely- ordered, causing indexing slowdowns when inserting new records
- v1: Leftmost field is the last 32 bits of a 60-bit 100-nanosecond resolution timestamp, which cycles roughly once every 7.16 seconds
- v2: Leftmost field is the user's UID or GID; in addition to only being available on POSIX-compliant systems it is, for all practical intents, effectively random
- UUID v3 and v5
- These versions of UUID offer no protection against collision; though v5 uses a stronger hash, hash collisions are still possible and become more likely over time, as there is no time component to limit the collision window
- Since hashes are effectively unordered, the entire UUID is extremely volatile (rather than just the leftmost field as with v1 and v2), causing indexing slowdowns when inserting new records
- UUID v4
- This version of UUID offers no protection against collision; the likelihood of a collision between 122 random bits becomes more likely over time, as here is no time component to limit the collision window
- The requirement of 122 bits of entropy can quickly deplete the available entropy pool if a large number of UUIDs are needed
- Being entirely random, the entire UUID is extremely volatile (rather than just the leftmost field as with v1 and v2), causing indexing slowdowns when inserting new records
The two leftmost fields are time components, with the leftmost being the least volatile, cycling approximately once every 2.285 billion years and limiting the indexing window to HUIDs generated within one second under ideal conditions. The second field is a microsecond clock, limiting the collision window to HUIDs created within the same microsecond (and with the same namespace values). The 3rd and 4th fields are available to be used as namespaces, to identify the host and/or user for which the HIUD was generated. The 5th field is a random component, the purpose of which is to reduce the likelihood of collision (to 1:1048576) in the case that two HUIDs are generated with the same namespace fields in the same microsecond. Additionally, one or both of the namespace fields may also be repurposed as additional random component(s).
Due to the field ordering decisions made in the design of HUID, the indexing window is effectively limited to one second, plus the widest time differential between any two hosts generating HUIDs, plus network latency.
For more non-language-dependent information, please review to contents of the docs directory:
Language-dependent documentation can be found in the README.md in the reference subdirectory for each language.