Tuesday, August 12, 2008

Re: [PERFORM] Using PK value as a String

Bill Moran wrote:
The main reason to use UUID instead of sequences is if you want to be able to generate unique values across multiple systems. So, for example, if you want to be able to send these userids to another system which is taking registrations from lots of places. Of course that only works if that other system is already using UUIDs and you're all using good generators.     
 Note that in many circumstances, there are other options than UUIDs.  If you have control over all the systems generating values, you can prefix each generated value with a system ID (i.e. make the high 8 bits the system ID and the remaining bits come from a sequence)  This allows you to still use int4 or int8.  UUID is designed to be a universal solution.  But universal solutions are frequently less efficient than custom-tailored solutions.   

Other benefits include:
    - Reduced management cost. As described above, one would have to allocate keyspace in each system. By using a UUID, one can skip this step.
    - Increased keyspace. Even if keyspace allocation is performed, an int4 only has 32-bit of keyspace to allocate. The IPv4 address space is already over 85% allocated as an example of how this can happen. 128-bits has a LOT more keyspace than 32-bits or 64-bits.
    - Reduced sequence predictability. Certain forms of exploits when the surrogate key is exposed to the public, are rendered ineffective as guessing the "next" or "previous" generated key is far more difficult.
    - Used as the key into a cache or other lookup table. Multiple types of records can be cached to the same storage as the sequence is intended to be universally unique.
    - Flexibility to merge systems later, even if unplanned. For example, System A and System B are run independently for some time. Then, it is determined that they should be merged. If unique keys are specific to the system, this becomes far more difficult to implement than if the unique keys are universal.

That said, most uses of UUID do not require any of the above. It's a "just in case" measure, that suffers the performance cost, "just in case."

Cheers,
mark

--  Mark Mielke <mark@mielke.cc> 

No comments: