Saturday, July 12, 2008

Re: [HACKERS] PATCH: CITEXT 2.0 v3

On Jul 12, 2008, at 15:13, David E. Wheeler wrote:

>> Sadly, I think you have to give up
>> attempts to check the interesting multibyte cases and confine
>> yourself
>> to tests using ASCII strings.
>
> Grr. Kind of defeats the purpose. Is there no infrastructure for
> testing multibyte functionality? Are test database clusters all
> built using SQL_ASCII and the C locale?

I just tried to take your modified tests and add multibyte tests that
run only on OS X with en_US.UTF-8. They worked like this:

CREATE OR REPLACE FUNCTION test_multibyte ()
RETURNS BOOLEAN AS $$
SELECT version() ~ 'apple-darwin'
AND (select setting from pg_settings where name = 'lc_collate')
= 'en_US.UTF-8';
$$ LANGUAGE SQL IMMUTABLE;

SELECT 'À'::citext = 'À'::citext WHERE test_multibyte() = true;
SELECT 'À'::citext = 'à'::citext WHERE test_multibyte() = true;

But then I realized that this would change the expected output
depending on the platform, and thus make the tests fail. This is one
reason why the inflexibility of the existing regression test model is
a drag: it limits you to testing only what works everywhere!

Grrr.

I'll remove all the multibyte character tests, but I have to say that,
as a result, the CITEXT 1 module would likely pass all such tests,
but it still isn't locale-aware. How can one add regressions to ensure
that something truly is locale-aware?

Thanks,

David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: