Monday, July 7, 2008

Re: [HACKERS] PATCH: CITEXT 2.0

On Jul 7, 2008, at 12:46, Zdenek Kotala wrote:

>> So, the upshot is that the = and <> operators are not locale-aware,
>> yes? They just do byte comparisons. Is that really the way it
>> should be? I mean, could there not be strings that are equivalent
>> but have different bytes?
>
> Correct. The problem is complex. It works fine only for normalized
> string. But postgres now assume that all utf8 strings are normalized.

I see. So binary equivalence is okay, in that case.

> If you need to implement < <= >= > operators you need to use strcol
> which take care of locale collation.

Which varstr_cmp() does, I guess. It's what textlt uses, for example.

> See unicode collation algorithm http://www.unicode.org/reports/tr10/

Wow, that looks like a fun read.

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: