Tuesday, July 8, 2008

Re: [HACKERS] Identifier case folding notes

Peter Eisentraut wrote:
> I have had some idle thoughts on the issue of identifier case folding. Maybe
> we can collect our ideas and inch closer to a solution sometime. Or we
> determine that it's useless and impossible, but then I can at least collect
> that result in the wiki and point future users to it.
>
> Background: PostgreSQL folds unquoted identifiers to lower case. SQL
> specifies folding to upper case, and most other SQL DBMS do it that way. If
> an application mixes referring to an object using quoted and unquoted
> versions of an identifier, then incompatibilities arise. We have always
> stressed to users that one should refer to an object always unquoted or
> always quoted. While that remains a good suggestion for a number of reasons,
> we have seen occasional complaints that some closed source applications
> violate this rule and therefore cannot be run on PostgreSQL.
>
> A number of solutions have been proposed over time, which I summarize here:
>
> 1. Change the lexer to fold to upper case, as it is supposed to do according
> to the SQL standard. This will break almost everything, because almost all
> built-in objects have lower-case names and thus couldn't be referred to
> anymore except by quoting. Changing the names of all the internal objects to
> upper-case names would involve vast code changes, probably break just as much,
> and make everything uglier. So this approach is unworkable.
>
> 2. Fold to upper case, but not when referring built-in objects. The lexer
> generally doesn't know what a name will refer to, so this is not possible to
> implement, at least without resorting to lots of hard-coding or horrid
> kludges. Also, a behavior like this will probably create all kinds of weird
> inconsistencies, resulting from putting catalog knowledge in the lexer.
>
> 3 and 4 -- Two variants of "ignore case altogether":
>
> 3. Fold all identifiers to lower case, even quoted ones. This would probably
> in fact fix the breakage of many of the above-mentioned problem applications,
> and it would in general be very easy to understand for a user. And it could
> be implemented in about three lines. One disadvantage is that one could no
> longer have objects that have names different only by case, but that is
> probably rare and incredibly stupid and can be neglected. The main
> disadvantage is that the case of identifiers and in particular column labels
> is lost. So applications and programming interfaces that look up result
> columns in a case-sensitive manner would fail. And things like SELECT expr AS
> "Nice Heading" won't work properly anymore.
>
> 4. Compare the "name" data type in a case-insensitive manner. This would
> probably address most problem cases. Again, you can't have objects with names
> different in case only. One condition to implementing this would be that this
> behavior would have be tied down globally at initdb, because it affects system
> indexes and shared catalogs. That might be impractical for some, because
> you'd need different instances for different behaviors, especially when you
> want to host multiple applications or want to port an affected application to
> the native PostgreSQL behavior over time.
>
> 5. One additional approach I thought of is that you swap the case of
> identifiers as you lex them (upper to lower, lower to upper), and then swap
> them back when you send them to the client. This needs a small change in the
> lexer, one for sending the RowDescription, and support in pg_dump and a few
> other places if desired. There will, however, be a number of weird, albeit
> self-imposed, side-effects. I have implemented a little test patch for this.
> It's weird, but it works in basic ways.
>
> Obviously, no solution will ever work completely. And we probably don't want
> such a solution, because it would create two different and incompatible
> PostgreSQL universes. If we are aiming for a solution that would allow most
> affected applications to hobble along, we would probably serve most users.
> Implementing some or all of 3, 4, and 5 would probably achieve that.
>
>


I'm not sure if you've read all the archive history on this. Here are
the pointers from the TODO list:

http://archives.postgresql.org/pgsql-hackers/2004-04/msg00818.php

http://archives.postgresql.org/pgsql-hackers/2006-10/msg01527.php

http://archives.postgresql.org/pgsql-hackers/2008-03/msg00849.php


The fact is that we have substantial groups of users who want different
things:
. current users who want no change so there is no breakage in existing apps
. users on other DBs who want Spec compliance
. users on yet other DBs who want case preservation

The last group should not be lightly dismissed - it is quite common
behaviour on MSSQL as well as on MySQL, so we have some incentive to
make this possible to encourage migration.

I'm strongly of the opinion therefore that this should be behaviour
determined at initdb time (can't make it later because of shared
catalogs). I suspect that we won't be able to do all this by simple
transformations in the lexer, unlike what we do now. But I do think it's
worth doing.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: