Thursday, June 12, 2008

Re: [HACKERS] Overhauling GUCS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


>> Really? I'm the opposite: I never leave a client's setting at 10, that's
>> just asking for trouble. Making it 100 *after* you encounter problem
>> queries is reactive; I prefer being proactive.

> Have you ever measured the system speed before and after?

Yes. No change (see below on caching) or faster (better plans).

>> Nor is a setting of 10 "perfectly adequate":

> What percentage of your plans actually change with the larger statistics? How
> many for the better? How many were massively improved?

It matters not if there is a slight increase in planning time: the penalty
of choosing a Bad Plan far outweighs any increased analyze or planning
cost, period. Are you arguing that 10 is a good default, or just against
larger values in general?

> I suspect you're looking at some single-digit percentage slowdown for planning
> across the board. In exchange if you a) have simple queries you probably see
> none improving. If you b) have moderately complex queries you probably get
> some single-digit percentage of them with improvements. And if you c) have
> very complex queries you probably have a handful of them which see massive
> improvements. Across the internet there are a whole lot more applications of
> type (a) than the others...

I'm still skeptical that it's the case, but I wouldn't mind seeing some figures
about how slowed down a "simple" database gets going from 10 to 100 (or larger).
Robert, any chance we can use Pagila for some sort of test for that?

>> Frankly, I'd be shocked if there is any significant difference and all
>> compared to the actual query run time.

> Well you might start preparing to be shocked. Note that retrieving the
> statistics is a query itself so it's not hard for it to be comparable to a
> similarly simple query. It's not hard for a simple query using multiple
> columns to be using more records of statistics than it is from the actual
> data. And things can look much worse if that data is TOASTed and requires
> further lookups and/or decompression...

Even if all you say above is true, and I think we'll have to agree to disagree
on that, there's an important point to remember: query plans can be (and very
often are) cached. Queries and query results cannot (although I'm working
on that... :) Plans to queries is a 1-N, where N can be very, very large,
and further boosts the query time vs planning time ratio.

...
> 100 is no longer a magic threshold for LIKE queries (in CVS HEAD)

That's great, but doesn't help existing releases (unless that was backpatched,
but I don't recall it if so) But that's a battle I'm going to stop fighting,
and concentrate on helping to find a replacement for 10 that may or may not
be 100.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200806122100
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkhRyXIACgkQvJuQZxSWSshZpwCeOdLZCu0gSQGpOLciQ6H29Tsd
iNgAn3AuoisE8zSbMjLuDL4aWzP6NAth
=ujTa
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: