Sunday, August 31, 2008

Re: [GENERAL] query with offset stops using index scan

On Sun, Aug 31, 2008 at 7:14 AM, Stanislav Raskin <sr@brainswell.de> wrote:
> Hello everybody,
>
> Now, if I increase OFFSET slowly, it works all the same way, until OFFSET
> reaches the value of 750. Then, the planner refuses to use an index scan and
> does a plain seq scan+sort, which makes the query about 10-20 times slower:
>
> I use 8.1.4, and I did a vacuum full analyze before running the queries.

If there's a chance to upgrade to 8.3 please do so. While 8.1 was a
solid reliable workhorse of a database, there's been a lot of work
done in general for better performance and more features. It likely
won't fix this one problem, but it's often smarter about corner cases
in query plans than 8.1 so it's worth looking into.

Now back to your problem. What's happening here is that the query
planner is switching plans because it thinks the sequential scan and
sort are cheaper. and at some point it will likely be right. That's
because a random page cost is much higher than a sequential page cost.
So at some point, say when you're grabbing 2% to 25% of a table, it
will switch to sequential scans.

Now, if the data is all cached, then it's still quicker to do the
index scan further along than to use a seq scan and a sort. Unless
your table is clustered to the index you're sorting on, a Seq scan
will almost always win if you need the whole table.

However, you may be in a position where a multi-column index and
clustering on id will allow you to run this offset higher. It's still
a poor performer for large chunks of large tables.

first cluster on the primary key id, then create a three column index
for (active, valid_until, locked) Note that the order should be from
the most choosey to least choosey column, generally. So assuming only
a tiny percentage of records meet valid_until, make it the first
column, and so forth. A query like:

select active, count(active) from table group by active;

will give you an idea there.

In the long run if you want good performance on larger data sets (i.e.
higher offset numbers) you'll likely need to switch to either cursors,
or using "where id between x and x+y" or lookup tables, or something
like that.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: