PostgreSQL: Re: [HACKERS] [WIP] collation support revisited (phase 1)

Monday, July 21, 2008

Re: [HACKERS] [WIP] collation support revisited (phase 1)

On Mon, Jul 21, 2008 at 03:15:56AM +0200, Radek Strnad wrote:
> I was trying to sort out the problem with not creating new catalog for
> character sets and I came up following ideas. Correct me if my ideas are
> wrong.
>
> Since collation has to have a defined character set.

Not really. AIUI at least glibc and ICU define a collation over all
possible characters (ie unicode). When you create a locale you take a
subset and use that. Think about it: if you want to sort strings and
one of them happens to contain a chinese charater, it can't *fail*.
Note strcoll() has no error return for unknown characters.

> I'm suggesting to use
> already written infrastructure of encodings and to use list of encodings in
> chklocale.c. Currently databases are not created with specified character
> set but with specified encoding. I think instead of pointing a record in
> collation catalog to another record in character set catalog we might use
> only name (string) of the encoding.

That's reasonable. From an abstract point of view collations and
encodings are orthoginal, it's only when you're using POSIX locales
that there are limitations on how you combine them. I think you can
assume a collation can handle any characters that can be produced by
encoding.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

PostgreSQL

Monday, July 21, 2008

Re: [HACKERS] [WIP] collation support revisited (phase 1)

No comments:

Blog Archive