> It seems to me that if the correlation is 0.99, and you're
> looking for less than 1% of rows, the expected rows may be at the
> beginning or at the end of the heap?
Not necessarily. Imagine for example that you have a table with 1M rows,
and one of the fields has unique values from 1 to 1M, and the rows are
ordered in the table by that field. So the correlation would be 1. If you
were to SELECT from the table WHERE the field = 500000 LIMIT 1, then the
database should be able to work out that the rows will be right in the
middle of the table, not at the beginning or end. It should set the
startup cost of a sequential scan to the amount of time required to
sequential scan half of the table.
Of course, this does bring up a point - if the matching rows are
concentrated at the end of the table, the database could perform a
sequential scan backwards, or even a scan from the middle of the table
onwards.
This improvement of course only actually helps if the query has a LIMIT
clause, and presumably would muck up simultaneous sequential scans.
Matthew
--
Picard: I was just paid a visit from Q.
Riker: Q! Any idea what he's up to?
Picard: No. He said he wanted to be "nice" to me.
Riker: I'll alert the crew.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
No comments:
Post a Comment