> On Tue, 2008-09-16 at 17:01 +0300, Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> > > Subtransactions cause a couple of problems for Hot Standby:
> > Do we need to treat subtransactions any differently from normal
> > transactions? Just treat all subtransactions as top-level transactions
> > until commit, and mark them all as committed when you see the commit
> > record for the top-level transaction.
> If we do that, snapshots become infinitely sized objects though, which
> then requires us to invent some way of scrolling it to disk. So having
> removed the need for subtrans, I then need to reinvent something similar
> (or at least something like a multitrans entry).
Currently we keep track of whether the whole subxid cache has
overflowed, or not. It seems possible to track for overflows of
individual parts of the cache. That makes the code path for subxid
overflow in GetSnapshotData() slightly slower, but it's not the common
case. We save time elsewhere in more common cases.
We would be able to avoid making an entry in subtrans for new subxids
unless our local backend has overflowed its cache. That will reduce
subtrans access frequency considerably and greatly reduce the number of
requests that might need to perform I/O, possibly to zero. It would also
avoid the need for generating WAL records for new subxids for standby.
The path thru XidInMVCCSnapshot() would then require us to *always*
check the subxid cache, even if it has overflowed. If we find the xid
then we don't need to check subtrans at all. That's quite useful because
searching the subxid cache is cheaper than looking in subtrans and the
probability it would be there rather than in subtrans is still good,
even for overflows of up to 3-5 times the subxid cache. It would
increase the cost of subxid checking slightly when running with very
high numbers of subxids.
For Hot Standby, this would mean we could avoid generating WAL records
for new subxids in most cases - only generate them when our backend's
subxid cache has overflowed. On the standby it then means we can store
xids into a fixed size snapshot without worrying about whether it
overflows because the xids all fitted in the snapshot on the master
(whose xids we are emulating), *or* we have a WAL record that tells us
the cache overflowed and we make the insert into subtrans instead. When
we use the standby snapshot we look in subxid cache first and if we
don't find it then we check in subtrans.
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support