Monday, August 25, 2008

Re: [HACKERS] IN, BETWEEN, spec compliance, and odd operator names

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sun, Aug 24, 2008 at 09:24:23PM -0400, Tom Lane wrote:
>> My own feeling is that we should avoid imputing particular semantics
>> to particular operator names, and so these constructs should always be
>> defined by reference to operators found in a default opclass for the
>> datatype, rather than by specific operator names.

> ISTM the problem is that there's no easy way to refer to "operators
> found in a default opclass", so perhaps we could invent a construct:

> A OPERATOR(btree,2) B

Huh? I don't understand why you think we need to expose this to users.
A user would presumably just write the name of the operator he wants,
if he's writing out a direct operator call.

To me the issue is what we consider IN and BETWEEN and similar
constructs to "mean", which in a datatype world boils down to choosing
which of the datatype's operators to implement the construct with.

> The problem is inferring the type, if A and B are
> different types, which operator class do you use?

Yeah, the cross-type problem occurred to me this morning too. For
instance consider

int4_var BETWEEN int8_var AND numeric_var

The current implementation of BETWEEN acts fairly sanely in this case;
although it ends up choosing two entirely unrelated operators (int48_ge
and numeric_le, which are not in any of the same operator families),
which is not great for subsequent optimization purposes. I'm not sure
about how we'd make a good choice using an opclass-driven approach.
Would we want to insist that both operators are found in the same
family? Perhaps so, because otherwise it's not real clear that you've
created a meaningful range constraint. Yet it's definitely possible
that such a requirement would cause the query to fail where it used to
work (for some value of "work").

Another way to approach it would be to consider the problem as being
similar to overloaded-function resolution, ie think of it as trying
to match an implicit or explicit function between(anyelement,
anyelement, anyelement). But that would fail to take into account
whether there are actually any suitable comparison operators for
whichever common type it chooses.

There's also an issue of not wanting to coerce variables unnecessarily.
In the above example, suppose int4_var has an index. The derived
clause int4_var int48_ge int8_var will be indexable using that index,
whereas int4_var::numeric numeric_le numeric_var won't be, If we coerce
all three inputs to numeric to enforce that we have consistent semantics
on both ends of the range check, then neither end will be indexable;
which seems like a step backwards.

So it's all a bit harder than it looks. Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: