Thursday, July 17, 2008

Re: [HACKERS] [PATCH]-hash index improving

-----Original Message-----

> From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Simon Riggs
> Sent: Thursday, July 17, 2008 4:10 PM

> To: Jonah H. Harris

> Cc: Xiao Meng; pgsql-hackers@postgresql.org; Kenneth Marshall

> Subject: Re: [HACKERS] [PATCH]-hash index improving

>

>

> On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote:

> > I'm not really seeing any performance improvements over b-tree.

>

> I'd like to see a theoretical analysis on the benefit case before we

> spend too many teraflops on investigation.

>

> In which cases do we expect that hash indexes will beat btrees?

Large table unique index equality search should be very fast with hashed
index (and the only place where any advantage will be seen). Hashed
indexes are useless for any search besides equality and gain more and
more when the levels of the b-tree index increase.

Here is a hash index lookup that is frightfully fast:
http://www.corpit.ru/mjt/tinycdb.html

It's a constant database, but the file format might be worth
examination. Here is a quickie definition of the CDB format:
http://cr.yp.to/cdb/cdb.txt

I think that the problem with hashed indexes tends to be the blocking of
index pages on disk. For instance, the FastDB/GigaBase author was able
to make FastDB memory based hash indexes that are faster than the memory
based b-tree versions, but not for the disk based GigaBase database.
The only way to get better performance from hash based indexes is to
read fewer index pages than if a tree-based index were used. So I think
that the scheme used to create the index pages is the focus to make them
worthwhile.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: