Monday, May 26, 2008

Re: [PERFORM] "Big O" notation for postgres?

Gregory Stark wrote:
> "Richard Huxton" <dev@archonet.com> writes:
>
>
>> Jonah H. Harris wrote:
>>
>>> On Wed, May 21, 2008 at 10:10 AM, H. Hall wrote:
>>>
>>>> Does anyone know if there is a source that provides "Big O" notation for
>>>> postgres's aggregate functions and operations? For example is count(*) =
>>>> O(1) or O(n)?
>>>>
>>> I don't know of any document containing the complexity of each
>>> aggregate, but it's sometimes left as a comment in the souce code.
>>>
>> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
>> presence of an index too, just to muddy the waters.
>>
When I first read the above from Jonah I just assumed some Postgres
magic was producing O(1). After seeing this again, I believe that
Postgres must be doing something like the following for max and min:
Max: ORDER BY colName DESC LIMIT 1
Min: ORDER BY coName ASC LIMIT 1

Thus Jonah's caveat about using an index. If postgres is using an index
as in the above then the Max and Min functions would both be O(log N) ,
this is log base2 of N, which is the time it takes to search a balanced
binary tree, not O(1) which implies a constant time to perform,
regardless of the size of the dataset N.

>
> Hm, true. But excluding those special cases all Postgres aggregate functions
> will be O(n) unless they're doing something very odd. None of the built-in
> functions (except min/max as noted) do anything odd like that.
>
> The reason way is because of the basic design of Postgres aggregate functions.
> They are fed every tuple one by one and have to keep their state in a single
> variable. Most of the aggregate functions like count(*) etc just keep a static
> non-growing state and the state transition function is a simple arithmetic
> operation which is O(1). So the resulting operation is O(n).
>
> Actually one exception would be something like
>
> CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}');
>
> Since the state variable has to keep accumulating more and more stuff the
> array_append becomes more and more expensive (it has to generate a new array
> so it has to copy the existing stuff). So actually it woul dbe O(n^2).
>
> The only builtin aggregate which looks like it falls in this category would be
> xmlagg()
>
>
Functions with O(N^2) scale very badly of course.

It would be very handy if the Planner could kick out Big O with its
estimates. This would allow one to quickly tell how a query scales with
a large number of rows.

Thanks,
HH

--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

No comments: