Tuesday, September 30, 2008

Re: [HACKERS] Block-level CRC checks

Hello Alvaro,

some random thoughts while reading your proposal follow...

Alvaro Herrera wrote:
> So we've been tasked with adding CRCs to data files.

Disks get larger and relative reliability shrinks, it seems. So I agree
that this is a worthwhile thing to have. But shouldn't that be the job
of the filesystem? Think of ZFS or the upcoming BTRFS.

> The idea is that these CRCs are going to be checked just after reading
> files from disk, and calculated just before writing it. They are
> just a protection against the storage layer going mad; they are not
> intended to protect against faulty RAM, CPU or kernel.

That sounds reasonable if we do it from Postgres.

> This code would be run-time or compile-time configurable. I'm not
> absolutely sure which yet; the problem with run-time is what to do if
> the user restarts the server with the setting flipped. It would have
> almost no impact on users who don't enable it.

I'd say calculating a CRC is close enough to be considered "no impact".
A single core of a modern CPU easily reaches way above 200 MiB/s
throughput for CRC32 today. See [1].

Maybe consider Adler-32 which is 3-4x faster [2], also part of zlib and
AFAIK about equally safe for 8k blocks and above.

> The implementation I'm envisioning requires the use of a new relation
> fork to store the per-block CRCs. Initially I'm aiming at a CRC32 sum
> for each block. FlushBuffer would calculate the checksum and store it
> in the CRC fork; ReadBuffer_common would read the page, calculate the
> checksum, and compare it to the one stored in the CRC fork.

Huh? Aren't CRCs normally stored as part of the block they are supposed
to protect? Or how do you expect to ensure the data from the CRC
relation fork is correct? How about crash safety (a data block written,
but not its CRC block or vice versa)?

Wouldn't that double the amount of seeking required for writes?

> I'd like to submit this for 8.4, but I want to ensure that -hackers at
> large approve of this feature before starting serious coding.

Very cool!

Regards

Markus Wanner

[1]: Crypto++ benchmarks:
http://www.cryptopp.com/benchmarks.html

[2]: Wikipedia about hash functions:
http://en.wikipedia.org/wiki/List_of_hash_functions#Computational_costs_of_CRCs_vs_Hashes

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: