Sunday, June 29, 2008

Re: [HACKERS] TODO item: Allow data to be pulled directly from indexes

Gregory Stark wrote:
>> (1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. Tom has
>> noted (in the linked message) that this is not reliable if the index has any
>> expression-valued columns, because it is not always possible to find the old
>> index entry. For this reason, the proposed patch does not keep visibility
>> metadata for indexes on expressions. This seems like a reasonable limitation
>> --- indexed expressions are just less efficient.
>>
>
> Or if the index operators and btproc aren't nearly as immutable as they claim.
> Probably less likely than non-immutable index expressions but also possible.
>
>
Your point is well taken... I'll have to look into that more.

>> (2) & (3) can work for any index, and they are quite elegant in the way that
>> the overhead does not change with the number of indexes. The TODO also notes
>> the benefit of (2) for efficient vacuuming. Thus, I think that (2) is a great
>> idea in general, but it does not serve the intended purpose of this TODO item.
>> Once a page gets marked as requiring visibility checks, it cannot be unmarked
>> until the next VACUUM. The whole point of this feature is that we are willing
>> to be more proactive during updates in order to make index access more
>> efficient.
>>
>
> Well I think that's precisely the point. If you're trading off work done at
> update time against work done for index accesses then you're only going to win
> if the tuples are relatively static and have lots of accesses done against
> them between updates. In which case having the optimization only kick in when
> the page has been static for long enough that all the tuples are globally
> visible should be good enough
I really don't understand this point. The way I see the visibility map
working is as follows: we set a page to "requires visibility check" when
a tuple on the page is inserted, deleted, or non-HOT updated. If the
only modifications have been inserts, we can reset the status to "all
tuples visible" when these tuples become universally visible, which
matches your description. But in the presence of deletes and updates, we
can only reset the status of a page after a VACUUM (I know that dead HOT
tuples can be pruned without VACUUM, but I don't think that's the case
for indexed tuples). We can't reset the status earlier because we don't
know what indexes still have pointers to the dead tuples. So a page can
be static indefinitely (after a single modification) without ever
getting to enjoy the optimization.

This is a really important point, so please let me know if I'm missing
something.

Thanks for your response!
Karl

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: