Monday, September 22, 2008

Re: [HACKERS] Initial prefetch performance testing

Ron Mayer <> writes:

> For example, on our sites hosted with Amazon's compute cloud (a great
> place to host web sites), I know nothing about spindles, but know
> about Amazon Elastic Block Store[2]'s and Instance Store's[1]. I
> have some specs and are able to run benchmarks on them; but couldn't
> guess how many spindles my X% of the N-disk device that corresponds
> to.

Well I don't see how you're going to guess how much prefetching is optimal for
those environments either...

> For another example, some of our salesguys with SSD drives
> have 0 spindles on their demo machines.

Sounds to me like you're finding it pretty intuitive. Actually you would want
"1" because it can handle one request at a time. Actually if you have a
multipath array I imagine you would want to think of each interface as a
spindle because that's the bottleneck and you'll want to keep all the
interfaces busy.

> I'd rather a parameter that expressed things more in terms of
> measurable quantities -- perhaps seeks/second? perhaps
> random-access/sequential-access times?

Well that's precisely what I'm saying. Simon et al want a parameter to control
how much prefetching to do. That's *not* a measurable quantity. I'm suggesting
effective_spindle_count which *is* a measurable quantity even if it might be a
bit harder to measure in some environments than others.

The two other quantities you describe are both currently represented by our
random_page_cost (or random_page_cost/sequential_page_cost). What we're
dealing with now is an entirely orthogonal property of your system: how many
concurrent requests can the system handle.

If you have ten spindles then you really want to send enough requests to
ensure there are ten concurrent requests being processed on ten different
drives (assuming you want each scan to make maximum use of the resources which
is primarily true in DSS but might not be true in OLTP). That's a lot more
than ten requests though because if you sent ten requests many of them would
end up on the same devices.

In theory my logic led me to think for ten drives it would be about 30.
Experiments seem to show it's more like 300-400. That discrepancy might be a
reason to put this debate aside for now anywaysand expose the internal
implementation until we understand better what's going on there.

Ironically I'm pretty happy to lose this argument because EDB is interested in
rolling this into its dynamic tuning module. If there's a consensus -- by my
count three people have spoken up already which is more than usual -- then
I'll gladly concede. Anyone object to going back to preread_pages? Or should
it be prefetch_pages? prefetch_blocks? Something else?

Gregory Stark
Ask me about EnterpriseDB's Slony Replication support!

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

No comments: