Tuesday, July 15, 2008

Re: [HACKERS] Fwd: Proposal - UUID data type

First - please stop copying this list - this is not the "convince Jerry to include UUID in MySQL" mailing list.

Really - I don't care what he thinks. But, on the subjects themselves and how they apply to *PostgreSQL*:

 Non-standard features just force people to stick with that one product.   In the long run, the only people who benefit are the product developers.   

I chose PostgreSQL over MySQL because it provided numerous features - both standard and non - that I needed on the day I made my decision. I don't care about the long run as a user. One might as well say 90% of the world is wrong for using Microsoft products, because it locks one into Microsoft. One can say this - and people do say this - but none of this changes the fact that 90% of the world is relatively happy with their choice. They voted with their dollars. All decisions should be made on a cost-benefit analysis - they should not be based on some arbitrary code like "I will not choose a solution that locks me in".

Additionally - in the context of MySQL - the main reason I chose PostgreSQL over MySQL is because it provided things like CREATE VIEW, which MySQL did not at the time. People such as Jerry can pretend that standards guarantee that a feature is in all products, but it seems quite clear that just because something is a standard does NOT mean it is implemented the same everywhere, or even at all. At the time I chose PostgreSQL it was my opinion that PostgreSQL was far more standards-compliant than MySQL was going to be for at least a few years. I am glad I came to the correct conclusion. MySQL implemented ACID as an after-thought. I mean - comone.

This is incorrect. UUID at 16 bytes is already "long" in terms of being used as a primary index. In an 8K page, one can only fit 512 UUIDs (forgetting the requirement for headers) - if it was stored as 32 bytes - or 36 bytes, or 40 bytes (with punctuation), it would be at less than 256 UUIDs per page. For a join table joining one set of UUID to another set, that's < 256 vs < 128. Doubling the size of an index row roughly doubles the time to look up the value.     
 Incorrect.  Doubling the size of the index has very little effect on how long it takes to look up a value.  Intelligent databases use a binary search so doubling the size only means one additional comparison need be done.  And heavily used indexes are generally cached in memory anyway.   

Wrong. A binary search that must read double the number of pages, and compare double the number of bytes, will take double the amount of time. There are factors that will reduce this, such as if you assume that most of the pages are in memory or cache memory, therefore the time to read the page is zero, therefore it's only the time to compare bytes - but at this point, the majority of the time is spent comparing bytes, and it's still wrong. If we add in accounting for the fact that UUID is compared using a possibly inlined memcpy() compared to treating it as a string where it is variable sized, and much harder to inline (double the number of oeprations), and it's pretty clear that the person who would make such a statement as above is wrong.

As another poster wrote - why not double the size of all other data structures too. It costs nothing, right?

Why does MySQL have a 3-byte integer support if they truly believe that saving 1 byte in 4 doesn't result in a savings for keys?

Cheers,
mark

--  Mark Mielke <mark@mielke.cc> 

No comments: