Monday, September 8, 2008

[HACKERS] Common Table Expressions (WITH RECURSIVE) patch

These are my initial comments about the Common Table Expressions (CTE)
patch, also known as WITH [RECURSIVE]. These comments are based on the
patch here:

http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php

This is a semantically complex feature, and the standard is fairly
complex as well. So I'm approaching this by making my own
interpretations from the standard first (I included my interpretations
and section references at the end of this email) and comparing to the
behavior of the patch.

The following examples may be inconsistent with the standard. Some
have already been mentioned, and I don't think they all need to be
fixed for 8.4, but I mention them here for completeness.

* Mutual Recursion:

with recursive
foo(i) as (values(1) union all select i+1 from bar where i < 10),
bar(i) as (values(1) union all select i+1 from foo where i < 10)
select * from foo;
ERROR: mutual recursive call is not supported

The standard allows mutual recursion.

* Single Evaluation:

with
foo(i) as (select random() as i)
select * from foo union all select * from foo;
i
-------------------
0.233165248762816
0.62126633618027
(2 rows)

The standard specifies that non-recursive WITH should be evaluated
once.

* RHS only:

with recursive
foo(i) as (select i+1 from foo where i < 10 union all values(1))
select * from foo;
ERROR: Left hand side of UNION ALL must be a non-recursive term in a
recursive query

The standard does not require that the recursive term be on the RHS.

* UNION ALL only:

with recursive
foo(i) as (values(1) union select i+1 from foo where i < 10)
select * from foo;
ERROR: non-recursive term and recursive term must be combined with
UNION ALL

The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT
(when the recursive term is not on the RHS of the EXCEPT).

* Binary recursion and subselect strangeness:

with recursive foo(i) as
(values (1)
union all
select * from
(select i+1 from foo where i < 10
union all
select i+1 from foo where i < X) t)
select * from foo;

Produces 10 rows of output regardless of what "X" is. This should be
fixed for 8.4.
Also, this is non-linear recursion, which the standard seems to
disallow.

* Multiple recursive references:

with recursive foo(i) as
(values (1)
union all
select i+1 from foo where i < 10
union all
select i+1 from foo where i < 20)
select * from foo;
ERROR: Left hand side of UNION ALL must be a non-recursive term in a
recursive query

If we're going to allow non-linear recursion (which the standard
does not), this seems like it should be a valid case.

* Strange result with except:

with recursive foo(i) as
(values (1)
union all
select * from
(select i+1 from foo where i < 10
except
select i+1 from foo where i < 5) t)
select * from foo;
ERROR: table "foo" has 0 columns available but 1 columns specified

This query works if you replace "except" with "union". This should be
fixed for 8.4.

* Aggregates allowed:

with recursive foo(i) as
(values(1)
union all
select max(i)+1 from foo where i < 10)
select * from foo;

Aggregates should be blocked according to the standard.
Also, causes an infinite loop. This should be fixed for 8.4.

* DISTINCT should supress duplicates:

with recursive foo(i) as
(select distinct * from (values(1),(2)) t
union all
select distinct i+1 from foo where i < 10)
select * from foo;

This outputs a lot of duplicates, but they should be supressed
according to the standard. This query is essentially the same as
supporting UNION for recursive queries, so we should either fix both for
8.4 or block both for consistency.

* outer joins on a recursive reference should be blocked:

with recursive foo(i) as
(values(1)
union all
select i+1 from foo left join (values(1)) t on (i=column1))
select * from foo;

Causes an infinite loop, but the standard says using an outer join
in this situation should be prohibited. This should be fixed for 8.4.

* ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The
standard does not seem to say that these should be rejected.


The following are my interpretations of relevant parts of the SQL
standard (200n), and the associated sections. These are only my
interpretation, so let me know if you interpret the standard
differently.

Non-linear recursion forbidden:
7.13: Syntax Rules: 2.g.ii
My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] may be the
same <query name>.
7.13: Syntax Rules: 2.g.iv

EXCEPT can't be used for recursive queries if a recursive reference
appears on the RHS:
7.13: Syntax Rules: 2.g.iii.1

INTERSECT ALL/EXCEPT ALL can't be used for recursive queries:
7.13: Syntax Rules: 2.g.iii.5

recursive references must appear in FROM clause:
7.13: Syntax Rules: 2.g.iii.3
My interpretation of this rule is that it does not allow a
recursive reference in a subquery in the targetlist or a subquery
in the where clause.

stratum defined:
7.13: Syntax Rules: 2.f
7.13: Syntax Rules: 2.g.i.1

recursive query must have anchor for every stratum:
7.13: Syntax Rules: 2.g.i.3

outer joins not allowed to join recursive references:
7.13: Syntax Rules: 2.g.iii.6
7.13: Syntax Rules: 2.g.iii.7

Aggregates/HAVING disallowed:
7.13: Syntax Rules: 2.g.iii.4.B

Mutual recursion defined:
7.13: Syntax Rules: 2.g.i.1

Evaluate each WITH entry once, even if it's referenced multiple times:
7.13: General Rules: 1
7.13: General Rules: 2.b
See also:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

Evaluation order with mutual recursion:
7.13: General Rules: 2.a
7.13: General Rules: 2.b

Evaluation semantics:
7.13: General Rules: 2.c

DISTINCT:
7.13: General Rules: 2.c.iv
7.13: General Rules: 2.c.ix.3.A

I will provide comments about the code and documentation soon. This is a
very useful feature.

Regards,
Jeff Davis


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: