Saturday, August 2, 2008

Re: [HACKERS] [WIP] patch - Collation at database level

On Sat, Aug 02, 2008 at 03:39:18PM +0200, Radek Strnad wrote:
> > I also think that the clauses you have attached to your CREATE
> > COLLATION statement (case-insensitive, accent-insensitive) are an
> > oversimplification of reality. I suggest you look up the Unicode
> > collation algorithm to learn about who collations work in practice.
>
> I already did in the very beginning of the development. The reason why I'm
> not implementing the whole Unicode collation algorithm is that this patch
> shold be sort of framework. You'll be able to use different collation
> functions not only POSIX locales so further development towards full Unicode
> collation algorithm is possible.

Agreed. Ofcourse it's a simplification of reality. POSIX locales are a
simplification of reality, but its the only form of collation currently
available to us. And quite frankly, I don't beleive postgresql should
be in the business of writing collation algorithms, we don't have the
expertese.

FWIW, I think case-insensitive and accent-insensitive are useful modifiers
that we should aim to support in the future.

> At the end of the next week I'll publish my bachelor thesis concerning this
> topic where everything will be explained in details so stay tuned.

Good luck!

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

No comments: