Tuesday, July 22, 2008

Re: [HACKERS] PATCH: CITEXT 2.0 v4

On Jul 18, 2008, at 01:39, Michael Paesold wrote:

> Calling regex functions with the case-insensitivity option would be
> great. It should also be possible to rewrite replace() into
> regexp_replace() by first escaping the regex meta characters.
>
> Actually re-implementing those functions in a case insensitive way
> would still be an option, but of course some amount of work. The
> question is, how much use case there is.

I've figured out how to make all the functions work using SQL function
workarounds, converting things and re-dispatching to the text versions
as appropriate. They work quite well, and can be converted to C later
if that becomes a requirement.

Meanwhile, on the question of whether or not regular expression and
LIKE comparisons *should* match case-insensitively, I have a couple
more observations:

* Thinking about how a true case-insensitive collation would work, I'm
quite certain that it would match case-insensitively. Anything else
would just be unexpected, because in a case-insensitive collation,
lowercase characters are, in practice, identical to uppercase
characters. As far as matching is concerned, there is no difference
between them. So the matching operators and functions against CITEXT
should follow that assumption.

* I tried a few matches on MySQL, where the collation is case-
insensitive by default, and it confirms my impression:

mysql> select 'Foo' regexp 'o$';
+-------------------+
| 'Foo' regexp 'o$' |
+-------------------+
| 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select 'Foo' regexp 'O$';
+-------------------+
| 'Foo' regexp 'O$' |
+-------------------+
| 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select 'Foo' like '%o';
+-----------------+
| 'Foo' like '%o' |
+-----------------+
| 1 |
+-----------------+
1 row in set (0.00 sec)

mysql> select 'Foo' like '%O';
+-----------------+
| 'Foo' like '%O' |
+-----------------+
| 1 |
+-----------------+
1 row in set (0.00 sec)

I'll grant that MySQL may not be the best model for how things should
work, but it's something, at least. Anyone else got access to another
database with case-insensitive collations to see how LIKE and regular
expressions work?

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: