Sunday, August 3, 2008

[GENERAL] bytea encode performance issues

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I am using postgresql 8.2.7 on gentoo for my dbmail backend.
I am also testing it on mysql 5.

I am trying to figure out if I need to tune my database configuration or
if querying a bytea field is just not practical in postgresql.

Searching with the mysql database takes under a minute and with the
postgresql database it takes approximately 10. It gets better when I fix
up the query a little, such as removing the group by and having and
including the clause as part of the where, but not anywhere close to the
mysql level.

This is the query that is used (I know it is not as efficient as it
could be, but this is the query it comes with):

SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages m
ON p.id = m.physmessage_id WHERE mailbox_idnr = 8 AND status IN (0,1 )
AND k.is_header = '0' GROUP BY m.message_idnr,k.messageblk HAVING
ENCODE(k.messageblk::bytea,'escape') LIKE '%John%'

The messageblk field is a bytea in postgresql and a longblob in mysql.
The only difference in the query is the MySQL does not need the encode
function.

I have plugged the query into the psql and mysql command line
applications, so I could evaluate the query without the application.

The database is using autovacuum and the estimated rows and the actual
rows are almost the same, so I assume it is working. There are 310266
rows in the dbmail_messageblks table.

Can someone make a suggestion for tuning the database?

The explain of the query is:
"HashAggregate (cost=43648.11..43648.85 rows=74 width=753)"
" -> Nested Loop (cost=42999.83..43647.74 rows=74 width=753)"
" -> Merge Join (cost=42999.83..43043.35 rows=74 width=769)"
" Merge Cond: (k.physmessage_id = m.physmessage_id)"
" -> Sort (cost=39264.12..39267.59 rows=1388 width=753)"
" Sort Key: k.physmessage_id"
" -> Seq Scan on dbmail_messageblks k
(cost=0.00..39191.68 rows=1388 width=753)"
" Filter: ((is_header = 0::smallint) AND
(encode(messageblk, 'escape'::text) ~~ '%John%'::text))"
" -> Sort (cost=3735.71..3754.59 rows=7552 width=16)"
" Sort Key: m.physmessage_id"
" -> Bitmap Heap Scan on dbmail_messages m
(cost=385.98..3249.26 rows=7552 width=16)"
" Recheck Cond: ((mailbox_idnr = 8) AND (status
= ANY ('{0,1}'::integer[])))"
" -> Bitmap Index Scan on dbmail_messages_8
(cost=0.00..384.10 rows=7552 width=0)"
" Index Cond: ((mailbox_idnr = 8) AND
(status = ANY ('{0,1}'::integer[])))"
" -> Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage p (cost=0.00..8.15 rows=1 width=8)"
" Index Cond: (k.physmessage_id = p.id)"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiVhHwACgkQjDX6szCBa+o6wACgwa05ZbUBL4Ef18N4JJHQ2SP1
gfwAnjIA14QktV/Qs1TrPiY+Ma+rmJht
=WOQM
-----END PGP SIGNATURE-----

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: