> Is there security/performance issue about this?
Performance, yes.
If we need access to more rows than will fit within work_mem we have a
problem and will need to restart sort. Giving random access to all
tuples in the current window, whatever its size would be very costly,
which is why we have optimized that access for merge joins. So we need
to know how far back access is required, if any - think of that as an
"access window" definition.
For example,
rownumber() doesn't need access to prior tuples at all.
lag(col, 1) requires access only to the prior row of the current window
ntile() needs to know the size of the window before we begin processing
In some cases the window itself is redefined for each tuple, e.g.
avg() over (order by ... range between 5 preceeding and current row)
In that case, we want the tuples no longer in the window to scroll out
of memory. We already have the mechanism for this: a dynamic tuplestore
(materialize node) in front of the tuplesort (sort node).
Most of that tuning can be done after the initial implementation, but my
point here is this: there needs to be a mechanism by which the window
access requirements can be specified for a function so the executor can
understand how to optimise access. So if you go the route of defining an
extensible API then you must include this also.
I know I rattle on about performance, but with window functions it will
be critical to their usability to have them perform well. We can already
do the types of analysis that window functions allow, it just requires
hand written procedures to do it. So the window functions must perform
acceptably well against very large tables (i.e. much bigger than
memory).
--
Simon Riggs
PostgreSQL Training, Services and Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
No comments:
Post a Comment