Monday, September 22, 2008

[HACKERS] FSM, now without WAL-logging

Attached is a revamped version of the FSM rewrite. WAL-logging is now
gone. Any inconsistencies between levels of the FSM is fixed during
vacuum, and by searchers when they run into a dead end because of a
discrepancy. Corruption within FSM pages is likewise fixed by vacuum and
searchers.

The FSM in a warm standby gets updated by replay of heap CLEAN WAL
records. That means that the FSM will reflect the situation after the
last vacuum, which is better than what we have now, but not quite
up-to-date. I'm worried that this might not be enough, because on a
large table, a lot of pages could've been filled since last vacuum, and
the first guy who tries to insert to the table will have to grind
through all those pages, finding that they're all full now. It would be
simple to update the FSM at every heap insert and update record, but
that then might be an unacceptable amount of overhead at recovery. Also,
the index FSM is not yet updated during recovery.

The FSM is now extended lazily, so there's no explicit
FreeSpaceMapExtendRel function anymore. To avoid calling smgrnblocks all
the time, I added a field to RelationData to cache the size of the FSM fork.

The fsm_search_avail() function now emulates the old FSM behavior more
closely. It should now always return the next page to the right of the
next-pointer (wrapping around if necessary).

I believe I've addressed all the other Tom's comments on the code as
well. Also, the "next-pointer" is now reset in vacuum, to encourage the
use of low-numbered pages, per Bruce's comment.

There's one known bug left. If we crash after truncating a relation, and
the truncation of the FSM doesn't hit the disk but the truncation of the
main fork does, we can end up with an FSM that shows that there's free
space on pages that don't exist anymore. The straightforward way to fix
that is to put back the WAL-logging of FSM truncations, but given that I
just ripped off all the rest of the WAL-logging, does anyone have other
ideas?

TODO:
- Performance testing, again.
- Add test case to regression suite that exercises index FSM
- Fix the crash+truncate bug
- Update index FSM during recovery
- Update FSM more frequetly during recovery
- Documentation

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

No comments: