guaranteed to be included in the contrib? If there is no guarantee,
seems the time of clearing will be wasted. (5 years ago I have already
cleaned one open-source library on demand and after that it was not
approved for PEAR repository, so now I am more circumspect, sorry :-).
So - is the approval procedure finished? If not, who could make the
final decision ("be or not to be")? Sorry, I don't yet know well the
patch proposal procedure...
P.S.
1. Unfortunately GROUP BY + ORDER BY is sometimes 1000 times (!)
slower than _int_group_count_sort. Because of that I have created this
function.
2. I have to assume that the input is sorted in functions, because
else the performance is lowered very much. Sort operation is quite
expensive; checking if an array is sorted or not is also quite
expensive... So I think it is not a good idea to add
sorting-independency to functions.
3. Seems the conversion of functions to anyarray/anyelement is much
more time-expensive than simply including it to specialized
intarray/intagg. Is it absolutely necessery?
On Mon, Sep 15, 2008 at 4:38 PM, Markus Wanner <markus@bluegap.ch> wrote:
> Hi,
>
> sorry for not having completed this review, yet. As you are obviously
> looking at the patch as well, I'll try to quickly write down my points so
> far.
>
> Trying to compile the intarray module, I now receive an error:
>
> error: 'INT4OID' undeclared (first use in this function)
>
> That can be solved by including "catalog/pg_type.h" from
> contrib/intarr/_int_op.c.
>
>
> The PG_FUNCTION_INFO_V1 and prototype definition certainly belong to the top
> of the file, where all others are.
>
> Some lines are longer than 80 columns and again comments are a bit sparse or
> even useless (no "additional things", please).
>
>
> Heikki Linnakangas wrote:
>>
>> I find it a bit unfriendly to have a function that depends on sorted
>> input, but doesn't check it. But that's probably not a good enough reason to
>> reject an otherwise simple and useful function. Also, we already have uniq,
>> which doesn't strictly speaking require sorted input, but it does if you
>> want to eliminate all duplicates from the array.
>
> I think it's a performance optimization which is absolutely required in some
> cases. Some time ago I've also had to rip out the sorting step from certain
> intarray module functions to save processing time.
>
> One option already mentioned somewhere would be saving a 'sorted' property
> for the array. Then again, I think such a thing would certainly have to be
> done globally, for all kinds of arrays.
>
>> _int_group_count_sort seems a bit special purpose. Why does it bother to
>> sort the output? That's wasted time if you don't need sorted output, or if
>> you want the array sorted by the integer value instead of frequency. If you
>> want sorted output, you can just sort it afterwards.
>
> Agreed. IMO the normal GROUP BY and ORDER BY stuff of the database itself
> should be used for such a thing. However, that means turning an array into a
> set of rows...
>
>> Also, it's requiring sorted input for a small performance gain, but
>> there's a lot more precedence in the existing intarray functions to not
>> require sorted input, but to sort the input instead (union, intersect, same,
>> overlap).
>
> ..and exactly these are the functions I had to wrap again to strip the
> sorting step, due to poor performance for known-sorted arrays.
>
>> I realize that the current implementation is faster for the use case where
>> the input is sorted, and output needs to be sorted, but if we go down that
>> path we'll soon have dozens of different variants of various functions, with
>> different ordering requirements of inputs and outputs.
>
> Agreed.
>
> However, given the OP is using that in production, there seems to be a use
> case for the optimization, where we have none for the same function without
> it.
>
>> So, I'd suggest changing _int_group_count_sort so that it doesn't require
>> sorted input, and doesn't sort the output. The binary search function looks
>> good to me (I think I'd prefer naming it bsearch(), though, though I can see
>> that it was named bidx in reference to the existing idx function). Also, as
>> Markus pointed out, the SGML docs need to be updated.
>
> As is, it should probably also carry the '_int' prefix, because it's not a
> general purpose array function. So propose to name it '_int_bsearch'.
>
> Overall I think these functions are overly specialized and should be
> replaced be more general counterparts in core. However, until we have that,
> it's hard to refuse such a thing for contrib.
>
> By that reasoning, however, the intarray would have to provide methods for
> sorted as well as asorted input arrays as well.
>
> I'm closing my part of reviewing of this patch now. Dmitry, how do you want
> to proceed with these patches? Do you have time for some cleaning up and
> writing documentation?
>
> Regards
>
> Markus Wanner
>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
No comments:
Post a Comment