Thursday, July 10, 2008

[HACKERS] [WIP] collation support revisited (phase 1)

Hi,

after long discussion with Mr. Kotala, we've decided to redesign our collation support proposal.
For those of you who aren't familiar with my WIP patch and comments from other hackers here's the original mail: http://archives.postgresql.org/pgsql-hackers/2008-07/msg00019.php

In a few sentences - I'm writing collation support for PostgreSQL that is almost independent on used collating function. I will implement POSIX locales but switch to ICU will be quite easy. Collations and character sets defined by SQL standard will be hard coded so we avoid non-existence in some functions.

The whole project will be divided into two phases:

phase 1
Implement "sort of framework" so the PostgreSQL will have basic guts (pg_collation & pg_charset catalogs, CREATE COLLATION, add collation support for each type needed) and will support collation at database level. This phase has been accepted as a Google Summer of Code project.

phase 2
Implement the rest - full collation at column level. I will continue working on this after finishing phase one and it will be my master degree thesis.

How will the first part work?

Catalogs
- new catalogs pg_collation and pg_charset will be defined
- pg_collation and pg_charset will contain SQL standard collations + optional default collation (when set other than SQL standard one)
- pg_type, pg_attribute, pg_namespace will be extended with references to default records in pg_collation and pg_charset

initdb
- pg_collation & pg_charset will contain each pre-defined records regarding SQL standard and optionally one record that will be non-standard set when creating initdb (the one using system locales)
- these two records will be referenced by pg_type, pg_attribute, pg_namespace in concerned columns and will be concidered as default collation that will be inherited

CREATE DATABASE ... COLLATE ...
- after copying the new database the collation will be default (same as cluster collation) or changed by COLLATE statement. Then we update pg_type, pg_attribute and pg_namespace catalogs
- reindex database

When changing databases the database collation will be retrieved from type text from pg_type. This part should be the only one that will be deleted when proceeding with phase 2. But that will take a while :-)

Thanks for all your comments

    Regards

        Radek Strnad

No comments: