Tuesday, September 30, 2008

Re: [HACKERS] Block-level CRC checks

> A customer of ours has been having trouble with corrupted data for some
> time. Of course, we've almost always blamed hardware (and we've seen
> RAID controllers have their firmware upgraded, among other actions), but
> the useful thing to know is when corruption has happened, and where.

That is an important statement, to know when it happens not necessarily to
be able to recover the block or where in the block it is corrupt. Is that
correct?

>
> So we've been tasked with adding CRCs to data files.

CRC or checksum? If the objective is merely general "detection" there
should be some latitude in choosing the methodology for performance.

>
> The idea is that these CRCs are going to be checked just after reading
> files from disk, and calculated just before writing it. They are
> just a protection against the storage layer going mad; they are not
> intended to protect against faulty RAM, CPU or kernel.

It will actually find faults in all if it. If the CPU can't add and/or a
RAM location lost a bit, this will blow up just as easily as a bad block.
It may cause "false identification" of an error, but it will keep a bad
system from hiding.

>
> This code would be run-time or compile-time configurable. I'm not
> absolutely sure which yet; the problem with run-time is what to do if
> the user restarts the server with the setting flipped. It would have
> almost no impact on users who don't enable it.

CPU capacity on modern hardware within a small area of RAM is practically
infinite when compared to any sort of I/O.
>
> The implementation I'm envisioning requires the use of a new relation
> fork to store the per-block CRCs. Initially I'm aiming at a CRC32 sum
> for each block. FlushBuffer would calculate the checksum and store it
> in the CRC fork; ReadBuffer_common would read the page, calculate the
> checksum, and compare it to the one stored in the CRC fork.

Hell, all that is needed is a long or a short checksum value in the block.
I mean, if you just want a sanity test, it doesn't take much. Using a
second relation creates confusion. If there is a CRC discrepancy between
two different blocks, who's wrong? You need a third "control" to know. If
the block knows its CRC or checksum and that is in error, the block is
bad.

>
> A buffer's io_in_progress lock protects the buffer's CRC. We read and
> pin the CRC page before acquiring the lock, to avoid having two buffer
> IO operations in flight.
>
> I'd like to submit this for 8.4, but I want to ensure that -hackers at
> large approve of this feature before starting serious coding.
>
> Opinions?

If its fast enough, its a good idea. It could be very helpful in
protecting users data.

>
> --
> Alvaro Herrera
> http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: