Wednesday, June 25, 2008

[HACKERS] Latest on CITEXT 2.0

Howdy,

I just wanted to report the latest on my pet project: implementing a
new case-insensitive text type, "citext", to be locale-aware and to
build and run on PostgreSQL 8.3. I'm not much of a C programmer (this
is only the second time I've written *anything* in C), so I also have
a few questions about my code, best practices, coverage, etc. You can
grab the latest here:

https://svn.kineticode.com/citext/trunk/

BTW, the tests in sql/citext.sql use the pgtap.sql file to run TAP
regression tests. So you can run them using `make installcheck` or
`make test`. The latter requires that pg_prove be installed; you can
get it here:

https://svn.kineticode.com/pgtap/trunk/

Anyway, I think I've got it pretty close to done. The tests cover a
lot of stuff -- nearly everything I could figure out, anyway. But
there are a few gaps.

As a result, I'd appreciate a little help with these questions, all in
the name of making this a solid data type suitable for use on
production systems:

* There seem to still be some implicit CASTS to text that I'd like to
duplicate. For example, select '192.168.1.2'::cidr::text;` works, but
`select '192.168.1.2'::cidr::citext;` does not. Where can I find the C
functions that do these casts for TEXT so that I can put them to work
for citext, too? The internal cast functions used in the old citext
distribution don't exist at all on 8.3.

* There are casts from text that I'd also like to harness for use by
citext, like `cidr(text)`. Where can I find these C functions as well?
(The upshot of this and the previous points is that I'd like citext to
be as compatible with TEXT as possible, and I just need to figure out
how to fill in the gaps in that compatibility.)

* Regular expression and LIKE comparisons using the the operators
properly work case-insensitively, but functions like replace() and
regexp_replace() do not. Should they? and if so, how can I make them
do so?

* The tests assume that LC_COLLATE is set to en_US.UTF-8. Does that
work well for standard PostgreSQL regression tests? How are locale-
sensitive tests run in core regression tests?

* As for my C programming, well, what's broken? I'm especially
concerned that I pfree variables appropriately, but I'm not at all
clear on what needs to be freed. Martijn mentioned before that btree
comparison functions free memory, but I'm such a C n00b that I don't
know what that actually means for my implementation. I'd actually
appreciate a bit of pedantry here. :-)

* Am I in fact getting an appropriate nul-terminated string in my
cilower() function using this code?

char * str = DatumGetCString(
DirectFunctionCall1( textout, PointerGetDatum( arg ) )
);

Those are all the questions I had about my implementation. I'd like to
get this thing done and released soon, so that I can be done with this
particular Yak and get back to what I'm *supposed* to be doing with my
time.

BTW, would there be any interest in this code going into contrib/ in
the distribution? I think that, if we can ensure that it works just
like LOWER() = LOWER(), but without requiring that code, then it would
be a great type to point people to to use instead of that SQL hack
(with all the usual caveats about it being locale-sensitive and not
canonically case-insensitive in the Unicode sense). If so, I'd be
happy to make whatever changes are necessary to make it fit in with
the coding and organization standards of the core and to submit it.

But please, don't expect a civarchar type from me anytime soon. ;-)

Many thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: