Sunday, August 24, 2008

[HACKERS] IN, BETWEEN, spec compliance, and odd operator names

I was looking just now at gram.y's handling of various peculiar SQL
constructs, and was reminded of a point that's bothered me before,
but I don't recall if it's ever been discussed explicitly on -hackers.

As an example, take the production for BETWEEN ASYMMETRIC:

a_expr BETWEEN opt_asymmetric b_expr AND b_expr
{
$$ = (Node *) makeA_Expr(AEXPR_AND, NIL,
(Node *) makeSimpleA_Expr(AEXPR_OP, ">=", $1, $4, @2),
(Node *) makeSimpleA_Expr(AEXPR_OP, "<=", $1, $6, @2),
@2);
}

Okay, this is a pretty direct implementation of how SQL99 defines the
construct: 8.3 <between predicate> syntax rule 6 saith

"X BETWEEN ASYMMETRIC Y AND Z" is equivalent to "X>=Y AND X<=Z".

But it leaves me feeling dissatisfied. What if the datatype has
standard comparison operators (as identified by a default btree opclass)
but they're not named ">=" and "<=" ? Perhaps more plausibly, what if
those operators exist but aren't in the current search path?

The production for NOT BETWEEN is even more troubling:

a_expr NOT BETWEEN opt_asymmetric b_expr AND b_expr
{
$$ = (Node *) makeA_Expr(AEXPR_OR, NIL,
(Node *) makeSimpleA_Expr(AEXPR_OP, "<", $1, $5, @2),
(Node *) makeSimpleA_Expr(AEXPR_OP, ">", $1, $7, @2),
@2);
}

I can't object too much to the hardwired application of DeMorgan's law
(NOT (A AND B) => (NOT A) OR (NOT B)) but what this also has is a
hardwired assumption that "<" and ">" exist and are the negators of
">=" and "<=" respectively. Probably true, but let's see you find
chapter and verse in the SQL spec to support that...


Seems to me that what this boils down to is whether we want to read the
spec literally ("it says the construct is defined in terms of operators
named >= and <=, therefore we should do that") or by intent (obviously
what they *want* is a construct that behaves sensibly in terms of the
datatype's semantics).

We are more than a bit schizophrenic on this point --- in different
parts of the system you can find these things being done both ways.
There is plenty of code that insists on finding a default btree opclass
to define notions of "less" or "greater"; but we have these purely
name-based transformations in gram.y, and I think there are some other
parts of the parser that do similar things.

I'm not particularly eager to start changing things in this area right
now, but it seems to me that it'd be a good idea to establish a project
policy about what we consider to be the preferred behavior, with an eye
to eventually migrating the parts of the system that don't conform.

My own feeling is that we should avoid imputing particular semantics
to particular operator names, and so these constructs should always be
defined by reference to operators found in a default opclass for the
datatype, rather than by specific operator names. However, that way
will likely take more code and cycles to implement than purely
name-based definitions; and there is also the argument that it violates
the in-so-many-words definitions given by the spec.

Comments?

regards, tom lane

PS: there are some other issues here, like whether BETWEEN should be
allowed to cause double evaluation of its left-hand argument, and
whether we wouldn't like it to get reverse-listed by ruleutils.c
in the original BETWEEN format rather than as an expanded version.
However, what I'd like to focus on in this particular thread is the
narrow issue of defining the constructs in terms of operator names
vs operator semantics.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: