PostgreSQL: Re: [HACKERS] Proposal - Collation at database level

Saturday, May 31, 2008

Re: [HACKERS] Proposal - Collation at database level

Radek Strnad napsal(a):
> Zdenek Kotala wrote:
>> Radek Strnad napsal(a):
>>
>> <snip>
>>
>>>
>>> I'm thinking of dividing the problem into two parts - in beginning
>>> pg_collation will contain two functions. One will have hard-coded rules
>>> for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT,
>>> UCS_BASIC). It will compare each string character bitwise and guarantee
>>> that the implementation will meet the SQL standard implemented in
>>> PostgreSQL.
>>> Second one will allow the user to use installed system locales. The set
>>> of these collations will obviously vary between systems. Catalogs will
>>> contain encoding and collation for calling the system locale function.
>>> This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592
>>> etc. if they will be availible.
>>>
>>> We will also need to change the way how strings are compared. Regarding
>>> the set database collation the right function will be used.
>>> http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675

>>>
>>>
>>> This design will make possible switch to ICU or any other implementation
>>> quite simple and will not cause any major rewriting of what I'm coding
>>> right now.
>>
>>
>> Collation function is main point here. How you mentioned one will be
>> only wrapper about strcmp and second one about strcoll. (maybe you
>> need four - char/wchar) Which function will be used it is defined in
>> pg_collation catalog by CREATE COLLATION command. But you need specify
>> name of locale for system locales. It means you need attribute for
>> storing locale name.
>>
> You're right. I've extended pg_collation for system locale columns. In
> the first stage we actually don't need any other catalogs such as
> encoding, etc. and we can build this functionality only on following
> pg_collation catalog. Used collation function (system or built-in) will
> be decided on existing collation name.
>
> CATALOG(pg_collations, ###)
> {
> NameData colname; /* collation name */
> Oid colschema; /* collation schema */
> NameData colcharset; /* character set specification */
> Oid colexistingcollation; /* existing collation */
> bool colpadattribute; /* pad attribute */
> bool colcasesensitive; /* case sensitive */
> bool colaccent; /* accent sensitive */
> NameData colsyslccollate; /* lc_collate */
> NameData colsyslcctype; /* lc_ctype */
> regproc colfunc; /* used collation function */
> } FormData_pg_collations;
>
>
>>> FormData_pg_collations;
>> It would be good to send list of new and modified SQL commands (like
>> CREATE COLLATION) for wide discussion.
>>
> CREATE COLLATION <collation name> FOR <character set specification> FROM
> <existing collation name> [ <pad characteristic> ] [ <case sensitive> ]
> [ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ]
>
> <pad characteristic> := NO PAD | PAD SPACE
> <case sensitive> := CASE SENSITIVE | CASE INSENSITIVE
> <accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE
>
> Since you can specify order by in select clause there's no need for
> adding ascending and descending type of collation. They will allways be
> ascending.
>
> DROP COLLATION <collation name>
>
> CREATE DATABASE ... [ COLLATE <collation name> ] ...
>
> ALTER DATABASE ... [ COLLATE <collation name> ] ...
>
>

I think catalog is good. Maybe attributes names colsyslccollate and
colsyslcctype should be more generic because they could be shared with ICU.

But collation function should be specified in CREATE COLLATION command.
Maybe CREATE COLLATION .... [STRCOL <fn name>]

Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

PostgreSQL

Saturday, May 31, 2008

Re: [HACKERS] Proposal - Collation at database level

No comments:

Blog Archive