Sunday, June 29, 2008

Re: [HACKERS] TODO item: Allow data to be pulled directly from indexes

"Karl Schnaitter" <karlsch@soe.ucsc.edu> writes:

"Karl Schnaitter" <karlsch@soe.ucsc.edu> writes:

> (1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. Tom has
> noted (in the linked message) that this is not reliable if the index has any
> expression-valued columns, because it is not always possible to find the old
> index entry. For this reason, the proposed patch does not keep visibility
> metadata for indexes on expressions. This seems like a reasonable limitation
> --- indexed expressions are just less efficient.

Or if the index operators and btproc aren't nearly as immutable as they claim.
Probably less likely than non-immutable index expressions but also possible.

> I should mention there is a major flaw in the patch, because it puts pointers
> to HOT tuples in the index, in order to capture the different transaction ids
> in the chain. I think this can be fixed by only pointing to the root of the HOT
> chain, and setting xmin/xmax to the entire range of transaction ids spanned by
> the chain. I'm not sure about all the details (the ctid and some other bits
> also need to be set).

I think you can think of a HOT chain as a single tuple. The xmin of the head
is the xmin of the chain and the xmax of the tail is the xmax of the chain.
The xmin/xmax of the intermediate versions are only interesting for
determining *which* of the HOT versions to look at, but the index pointer
points to the whole chain.

> (2) & (3) can work for any index, and they are quite elegant in the way that
> the overhead does not change with the number of indexes. The TODO also notes
> the benefit of (2) for efficient vacuuming. Thus, I think that (2) is a great
> idea in general, but it does not serve the intended purpose of this TODO item.
> Once a page gets marked as requiring visibility checks, it cannot be unmarked
> until the next VACUUM. The whole point of this feature is that we are willing
> to be more proactive during updates in order to make index access more
> efficient.

Well I think that's precisely the point. If you're trading off work done at
update time against work done for index accesses then you're only going to win
if the tuples are relatively static and have lots of accesses done against
them between updates. In which case having the optimization only kick in when
the page has been static for long enough that all the tuples are globally
visible should be good enough.

The case where index visibility info might win over a visibility map might be
if the tuples are being heavily updated by long-lived transactions. In which
case they never sit globally visible for very long but having the xmin/xmax in
the index might avoid having to do a heap access for tuples which haven't been
committed yet.

As you seem to realize there has been a lot of discussion in this area
already. The visibility map looks like a much more popular direction.


--
Gregory Stark
EnterpriseDB

http://www.enterprisedb.com

Ask me about EnterpriseDB's RemoteDBA services!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: