Tuesday, September 30, 2008

Re: [HACKERS] Block-level CRC checks

On Tue, Sep 30, 2008 at 2:02 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> A customer of ours has been having trouble with corrupted data for some
> time. Of course, we've almost always blamed hardware (and we've seen
> RAID controllers have their firmware upgraded, among other actions), but
> the useful thing to know is when corruption has happened, and where.

Agreed.

> So we've been tasked with adding CRCs to data files.

Awesome.

> The idea is that these CRCs are going to be checked just after reading
> files from disk, and calculated just before writing it. They are
> just a protection against the storage layer going mad; they are not
> intended to protect against faulty RAM, CPU or kernel.

This is the common case.

> This code would be run-time or compile-time configurable. I'm not
> absolutely sure which yet; the problem with run-time is what to do if
> the user restarts the server with the setting flipped. It would have
> almost no impact on users who don't enable it.

I've supported this forever!

> The implementation I'm envisioning requires the use of a new relation
> fork to store the per-block CRCs. Initially I'm aiming at a CRC32 sum
> for each block. FlushBuffer would calculate the checksum and store it
> in the CRC fork; ReadBuffer_common would read the page, calculate the
> checksum, and compare it to the one stored in the CRC fork.
>
> A buffer's io_in_progress lock protects the buffer's CRC. We read and
> pin the CRC page before acquiring the lock, to avoid having two buffer
> IO operations in flight.

If the CRC gets written before the block, how is recovery going to
handle it? I'm not too familiar with the new forks stuff, but
recovery will pull the old block, compare it against the checksum, and
consider the block invalid, correct?

> I'd like to submit this for 8.4, but I want to ensure that -hackers at
> large approve of this feature before starting serious coding.

IMHO, this is a functionality that should be enabled by default (as it
is on most other RDBMS). It would've prevented severe corruption in
the 20 or so databases I've had to fix, and other than making it
optional, I don't see the reasoning for a separate relation fork
rather than storing it directly on the block (as everyone else does).
Similarly, I think Greg Stark was playing with a patch for it
(http://archives.postgresql.org/pgsql-hackers/2007-02/msg01850.php).

--
Jonah H. Harris, Senior DBA
myYearbook.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: