Wednesday, August 6, 2008

Re: [HACKERS] Status of DISTINCT-by-hashing work

2008/8/6 Tom Lane <tgl@sss.pgh.pa.us>:
> Gregory Stark <stark@enterprisedb.com> writes:
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>> There are still two places in the system that hard-wire the use of
>>> sorting for duplicate elimination:
>>>
>>> * Set operations (UNION/INTERSECT/EXCEPT)
>
>> Egads. Are you thinking to reimplement them more in line with the way other
>> nodes work? Or just have them choose between hashing and sorting themselves?
>
> Well, actually, after looking closer I'm realizing that it's harder than
> I thought. I had been thinking that we could just have the planner
> choose whether to generate grouping instead of sorting nodes, but that
> only works for plain UNION. For INTERSECT/EXCEPT (with or without ALL),
> you really need to maintain counters in each hashtable entry so you know
> how many matching rows you got from each side of the set operation.
> So it'd be necessary to either duplicate a large chunk of nodeAgg.c, or
> make that code handle hashed INTERSECT/EXCEPT along with all its
> existing duties. Neither of which seems particularly appealing :-(.
> I'm going to look at whether nodeAgg can be refactored to avoid this,
> but I'm feeling a bit discouraged about it at the moment.

In working on window functions, I also found that nodeWindow.c
duplicates much of nodeAgg.c, which contains not only aggregates but
reading ahead until next group.

Additionally, not having implemented but planned, frame concept that
slides aggregates within a partition will require multiple saved
positions of tuplestore. Up to now Tuplestore has functionality to
mark/restore pos but it is only one chance, which means when you mark
a pos the previous pos cannot be restore anymore. The window frame
will need to do mark multiple times and to restore older ones.

>> Any idea what would the needed executor infrastructure look like? Would it
>> have anything in common with the OLAP window functions infrastructure?
>
> Possibly; I haven't paid much attention to the OLAP work yet.
>
> regards, tom lane

In my patch nodeWindow.c, some functions reach for its parent state
node to get info of sort keys by using fcinfo->context. This works but
is completely ugly. At least, window functions need ability to reach
for key or whole tuple of current or of offset (preceding/following)
row. If another feature like DISTINCT needs similar one, I am
encouraged to give more opinion.

Regards,

--
Hitoshi Harada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: