Wednesday, June 25, 2008

Re: [PERFORM] Hardware vs Software Raid

"Also sprach Merlin Moncure:"
> write back: raid controller can lie to host o/s. when o/s asks

This is not what the linux software raid controller does, then. It
does not queue requests internally at all, nor ack requests that have
not already been acked by the components (modulo the fact that one can
deliberately choose to have a slow component not be sync by allowing
"write-behind" on it, in which case the "controller" will ack the
incoming request after one of the compionents has been serviced,
without waiting for both).

> integrity and performance. 'write back' caching provides insane burst
> IOPS (because you are writing to controller cache) and somewhat
> improved sustained IOPS because the controller is reorganizing writes
> on the fly in (hopefully) optimal fashion.

This is what is provided by Linux file system and (ordinary) block
device driver subsystem. It is deliberately eschewed by the soft raid
driver, because any caching will already have been done above and below
the driver, either in the FS or in the components.

> > However the lack of extra buffering is really deliberate (double
> > buffering is a horrible thing in many ways, not least because of the
>
> <snip>
> completely unconvincing.

But true. Therefore the problem in attaining conviction must be at your
end. Double buffering just doubles the resources dedicated to a single
request, without doing anything for it! It doubles the frequency with
which one runs out of resources, it doubles the frequency of the burst
limit being reached. It's deadly (deadlockly :) in the situation where
the receiving component device also needs resources in order to service
the request, such as when the transport is network tcp (and I have my
suspicions about scsi too).

> the overhead of various cache layers is
> completely minute compared to a full fault to disk that requires a
> seek which is several orders of magnitude slower.

That's aboslutely true when by "overhead" you mean "computation cycles"
and absolutely false when by overhead you mean "memory resources", as I
do. Double buffering is a killer.

> The linux software raid algorithms are highly optimized, and run on a

I can confidently tell you that that's balderdash both as a Linux author
and as a software RAID linux author (check the attributions in the
kernel source, or look up something like "Raiding the Noosphere" on
google).

> presumably (much faster) cpu than what the controller supports.
> However, there is still some extra oomph you can get out of letting
> the raid controller do what the software raid can't...namely delay
> sync for a time.

There are several design problems left in software raid in the linux kernel.
One of them is the need for extra memory to dispatch requests with and
as (i.e. buffer heads and buffers, both). bhs should be OK since the
small cache per device won't be exceeded while the raid driver itself
serialises requests, which is essentially the case (it does not do any
buffering, queuing, whatever .. and tries hard to avoid doing so). The
need for extra buffers for the data is a problem. On different
platforms different aspects of that problem are important (would you
believe that on ARM mere copying takes so much cpu time that one wants
to avoid it at all costs, whereas on intel it's a forgettable trivium).

I also wouldn't aboslutely swear that request ordering is maintained
under ordinary circumstances.

But of course we try.


Peter

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

No comments: