Sunday, August 3, 2008

Re: [GENERAL] bytea encode performance issues

The LIKE operator is likely the problem, but it is a critical part of an
email application. Searches are done by, "Show me all emails containing
the following word."

I've tried using TSearch2's full text index. It made the query 50%
faster, taking 5 minutes. This is still not even close to the less then
a minute in MySQL.

>
> That LIKE operator is probably your problem. An unbounded LIKE like that
> (with a wildcard on both sides) means no index can be used, hence you
> get a sequential scan.
>
> There are apparently some possibilities with the new GIN indexes (or
> maybe even using GIST), but I haven't had an opportunity to try those
> yet. There were some messages about just that on this list recently.
>

I don't think a functional index (or anything other then a FTI) would
accomplish anything, being that I am doing unbounded Likes.

> If you create an index, make sure you create a _functional_ index over
> ENCODE(messageblk, 'escape').
>

Email is binary when it contains attachments. I actually planned on
using an ASCII encoding, but the dbmail people specifically said not to.
I don't know if they were speaking from experience, or because ASCII
sounds bad.

> Since when is e-mail binary data? I don't quite see why you'd use a
> bytea field instead of text. If your problem is character encoding, then
> just don't store that ("encode" the DB using SQLASCII).
>

As I mentioned, this is the system that came with dbmail. It runs on
both PostGresql and MySQL, so they may have done some compatibility
things. There are 4 statuses possible, 0,1,2,3 if you use the database
through the software then a 99 could never appear there.

>
>
> I notice some other oddities in that query/design. Why is is_header a
> smallint instead of a boolean? I'm assuming this is for compatibility
> with an other database?
>
> Why use status IN (0, 1) instead of more descriptive keys? Is it even
> constrained this way, or could arbitrary numbers end up as status (say
> 99) and if so, what happens to those messages?
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:824,48958f30243481673380013!
>
>

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: