Massive Technical Interviews Tips: UUID Misc

Monday, October 5, 2015

UUID Misc

https://segment.com/blog/a-brief-history-of-the-uuid/KSUID is an abbreviation for K-Sortable Unique IDentifier. It combines the simplicity and security of UUID Version 4 with the lexicographic k-ordering properties of Flake

KSUIDs are larger than UUIDs and Flake IDs, weighing in at 160 bits. They consist of a 32-bit timestamp and a 128-bit randomly generated payload. The uniqueness property does not depend on any host-identifiable information or the wall clock. Instead it depends on the improbability of random collisions in such a large number space, just like UUID Version 4. To reduce implementation complexity, the 122-bits of UUID Version 4 are rounded up to 128-bits, making it 64-times more collision resistant as a bonus, even when the additional 32-bit timestamp is not taken into account.

https://en.wikipedia.org/wiki/Universally_unique_identifier
The "nil" UUID, a special case, is the UUID, 00000000-0000-0000-0000-000000000000; that is, all bits set to zero
Version 1 (date-time and MAC address)
Version 2 (date-time and MAC Address, DCE security version)

Version 3 and 5 UUIDs are generated by hashing a namespace identifier and name. Version 3 uses MD5 as the hashing algorithm, and version 5 uses SHA1

The namespace identifier is itself a UUID. The specification provides UUIDs to represent the namespaces for URLs, fully qualified domain names, object identifiers, and X.500 distinguished names; but any desired UUID may be used as a namespace designator.

Version 3 and 5 UUIDs have the property that the same namespace and name will map to the same UUID. However, neither the namespace nor name can be determined from the UUID, given the other, except by brute-force search.

A version 4 UUID is randomly generated. As in other UUIDs, four bits are used to indicate version 4, and 2 or 3 bits to indicate the variant (10 or 110 for variants 1 and 2, respectively). Thus, for variant 1 (that is, most UUIDs) a random version 4 UUID will have 6 predetermined variant and version bits, leaving 122 bits for the randomly-generated part, for a total of 2¹²², or 5.3x10³⁶(5.3 undecillion) possible version 4 variant 1 UUIDs.

You can use UUID this way to get always the same UUID for your input String:

 String aString="JUST_A_TEST_STRING";
 String result = UUID.nameUUIDFromBytes(aString.getBytes()).toString();

How to generate time based UUIDs?

The version 4 UUID is UUID based on random bytes. We fill the 128-bits with random bits (6 of the bits are correspondingly set to flag the version and variant of the UUID). No special configuration or implementation decisions are required to generate version 4 UUID's.

The version 1 UUID is a combination of node identifier (MAC address), timestamp and a random seed. The version one generator uses the commons-discovery package to determine the implementation. The implementations are specified by system properties.

https://github.com/cowtowncoder/java-uuid-generator

JDK's java.util.UUID has flawed implementation of compareTo(), which uses naive comparison of 64-bit values. This does NOT work as expected, given that underlying content is for all purposes unsigned. For example two UUIDs:

7f905a0b-bb6e-11e3-9e8f-000000000000
8028f08c-bb6e-11e3-9e8f-000000000000

would be ordered with second one first, due to sign extension (second value is considered to be negative, and hence "smaller").

Because of this, you should always use external comparator, such ascom.fasterxml.uuid.UUIDComparator, which implements expected sorting order that is simple unsigned sorting, which is also same as lexicographic (alphabetic) sorting of UUIDs (when assuming uniform capitalization).

https://en.wikipedia.org/wiki/Universally_unique_identifier

The challenge with a UUID is to make it be unique for multiple processes running on a single machine and multiple threads running in a single process. The Type 1 UUID as specified above does neither. On a fast machine with multiple cores it is quite possible to have a UUID generated with the same time value. This can be remedied only if the sequence number can span threads and processes, something that is quite challenging to do efficiently.

The Time Based UUID referenced compensates for these issues by:

Only using the normal millisecond granularity returned by System.currentTimeMillis() and adjusting it to pretend to contain 100 ns counts.
Incrementing the time by 1 (in a non-threadsafe manner) whenever a duplicate time value is encountered.
Using a pseudo-random number associated with the UUID Class for the sequence number.

https://raw.githubusercontent.com/cowtowncoder/java-uuid-generator/3.0/release-notes/FAQ
Libraries
http://johannburkard.de/software/uuid/
UUID generates version 1 UUIDs that contain the the MAC address of a network card.
https://wiki.apache.org/cassandra/TimeBaseUUIDNotes
https://github.com/cowtowncoder/java-uuid-generator
https://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java

http://www.java2s.com/Code/Java/Development-Class/YourownUUID.htm
timeseries in Cassandra and DynamoDB

The partition key + range key needs to be unique. For my timeseries data it seemed pretty obvious that my range key needed to be the timestamp, and the partition key could be anything else about the event. But how do I make this unique? Theoretically an API consumer could issue two identical requests at once, so I might not have unique events.

Cassandra came to the rescue here. Cassandra has a timeuuid data type. This is a type 1 (time-based) UUID with special ordering behaviour. Cassandra takes your timestamp plus some magic to make it unique and saves that. When asked to order by one of these fields it looks first at the parts of the UUID which contain the time so the events are ordered as you expect. It also has functions to determine the greatest and least possible timeuuid values for a given time, so you can use these to build a query that contains a time range.

you should include some time element in your partition key. I started by using the date: my partition key is now the application ID plus the date, and the range key is still the full (unique-ified) timestamp.

The solution I came up with was to include a sharding factor in the partition key. I also changed the date part to be just the year and month, not the day. So my partition key is now Application ID + year + month + sharding factor. The sharding factor is simply a counter, modulo an application dependent sharding modulus.
http://javadox.com/com.netflix.astyanax/astyanax-core/1.56.37/com/netflix/astyanax/util/TimeUUIDUtils.html

http://www.sohamkamani.com/blog/2016/10/05/uuid1-vs-uuid4/
The universally unique identifier, or UUID, was designed to provide a consistent format for any ID we use for our data. Another problem UUIDs were here to solve, was to not give a potential adversary any information about the data it represented.

This is a tradeoff between uniqueness and randomness that is represented by v1 and v4 of UUID generators.

V1 : Uniqueness

UUID v1 is generated by using a combination the host computers MAC address and the current date and time. In addition to this, it also introduces another random component just to be sure of its uniqueness.

This means you are guaranteed to get a completely unique ID, unless you generate it from the same computer, and at the exact same time. In that case, the chance of collision changes from impossible to very very small because of the random bits.

This guaranteed uniqueness comes at the cost of anonymity. Because UUID v1 takes the time and your MAC address into consideration, this also means that someone could potentially identify the time and place(i.e. computer) of creation. Try regenerating the UUIDs above, and you will see that some part of the UUID v1 is constant.

V4 : Randomness

The generation of a v4 UUID is much simpler to comprehend. Each and every bit of a UUID v4 is generated randomly and with no inherent logic. It’s that simple. There is, therefore, no question of anonymity.

However, there is now a chance that a UUID could be duplicated. The question is, do you need to worry about it?

The short answer is no. With the sheer number of possible combinations (2^128), it would be almost impossible to generate a duplicate unless you are generating trillions of IDs every second, for many years. This is a laughable standard for any application in todays world, and not substantial enough to take into consideration.

If you actually want your UUID to give some indication of the date and computer in which it was created, then UUID v1 may be for you.

https://stackoverflow.com/questions/2201558/sequential-guid-in-java

For sequential UUIDs, you are looking for a version 1 UUID. Java UUID Generator project seems to work quite well and is pretty easy to use:

Generators.timeBasedGenerator().generate().toString()

Monday, October 5, 2015

UUID Misc

V1 : Uniqueness

V4 : Randomness

Labels

Popular Posts