Sunday, July 6, 2008

Re: [HACKERS] introduction of WIP window function patch

2008/7/6 Simon Riggs <simon@2ndquadrant.com>:
>> I think there are two types of functions for windowed mode.
>> - windowed aggregate
>> this type of function is exactly same as normal aggregate. So we use
>> functions that have been in pgsql already. Actually in my patch above,
>> I didn't introduce any new function. This type of function includes
>> simply sum(), avg(), etc. which returns same values on a partition or
>> a window frame.
>>
>> - windowed function
>> this is the NEW type of function. I guess we should add a new function
>> type to pgsql. This type of function includes rank(), rank_dense(),
>> row_number(), etc. Windowed functions returns different values per
>> tuple.
>>
>> The difference between two types is if the function returns the same
>> value during a partition or different values.
>>
>> So, windowed aggregate and normal aggregate overlap each other. How
>> you know which one is that you see OVER clause in SQL just after the
>> function call. When you see OVER after func(), and pg_proc says it's
>> an aggregate, it's a windowed aggregate. Otherwise, it's a windowed
>> function.
>>
>> If I misunderstood about those definitions please correct me.
>
> Yes, I understand that and I think Martijn does also.
>
> I've done some thinking and rooting around on this and I think I have a
> different proposal for you, different to what we just discussed.
>
> SQL2008 specifies window functions as
>
> * rank functions
> * distribution functions: percent_rank() and cume_dist()
> * rownumber()
> * ntile()
> * lead() and lag()
> * first, last and n-th value functions
> * inverse distribution functions (similar to n-th value, based upon
> distribution function results)
>
> plus window aggregate functions (the normal aggregates COUNT, SUM etc)
>
> Now looking through all of those, I don't see *any* window functions
> that need access to different datatypes, or actually need to see the
> values of the attributes.
>
> The normal aggregates work with windows identically to the way they do
> without windows, so no change needed there.
>
> AFAICS we could define all of the non-aggregate window functions on the
> above list *without* defining them as functions in pg_proc. That would
> be a benefit because the window functions are very powerful and we'd
> need to give them access to any/all tuples in the window.
>
> So that would mean we don't provide a mechanism for user-defined
> windowed aggregate functions at all. Which solves the discussion about
> how to pass generic info through to them (at least long enough to get
> the first implementation done).
>
> We do already have such functions in code, e.g. greatest(). Sure they
> need to be defined in code, but we don't need to come up with a generic
> API for them.
>
> If you disagree, think about how we'd implement lag() or ntile() and
> what info we'd need to pass them.

Well, your idea is one of considerable choices. But I like pgsql's
extensibility that enables pgsql more powerful DBMS. So, I design it
as you propsed though trying to unify the function form somehow.

Just idea, how about pass window object to a function? We'll provide
window operation API then in the function you take window object
through fcinfo:

Datum func(PG_FUNCTION_ARGS)
{
Datum v;
WindowObject w = get_window(fcinfo);
HeapTuple htup_current = window_current_row(w);
HeapTuple htup_prev = window_preceding(w, 1);
/* do something */
PG_RETURN_DATUM(v);
}

so that a function access whole the window. APIs include
- current row
- preceding row
- following row
- current key
- preceding key
- following key
- iterate for the window
where "key" means ORDER BY values in OVER clause. Fortunately, my
patch uses tuplestore/tuplesort to create window, which allows random
access operation such above. Is there security/performance issue about
this?

--
Hitoshi Harada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: