Friday, August 29, 2008

Re: [GENERAL] 8.3.1 query plan

Tom Lane wrote:
> Steve Clark <sclark@netwolves.com> writes:
>
>>Tom Lane wrote:
>>
>>>Consider testing the conditions on A at the top level, instead of
>>>redundantly checking them inside the sub-query on B.
>
>
>>Thanks for the response Tom, I am a SQL neophyte, so I'll try to
>>rework the query.
>
>
> What I meant to suggest was just
>
> explain insert into myevents select * from t_unit_event_log a where
> a.event_status = 1 and a.event_ref_log_no IS NOT NULL
> and a.event_log_no not in (select event_log_no from myevents)
> and exists (select b.event_log_no from myevents b
> where a.event_ref_log_no = b.event_log_no)
>
> ie, pull everything out of the subquery that doesn't depend on B.
>
> Although, looking at it in this form, it seems like you'd be well
> advised to then replace the EXISTS with an IN:
>
> ... and a.event_ref_log_no in (select b.event_log_no from myevents b)
>
> Although those two forms should be equivalent, reality is that current
> releases of PG are generally smarter about optimizing IN than EXISTS.
> (The difference should largely go away as of 8.4.)
>
> regards, tom lane
>
>
Thanks again Tom.

It appears to me that in both cases the planner ends up doing a seq scan of the myevents table
for each proposed new insertion from the t_unit_event_log table which make the query take to
long to be feasible. What I need to do is somehow immediately eliminate the candidate row from
the t_unit_event_log if the row is already in the myevents table, but I am at a loss as to how to do
that.

Regards,
Steve

srm2=# explain insert into myevents select * from t_unit_event_log a where
srm2-# a.event_status = 1 and a.event_ref_log_no IS NOT NULL
srm2-# and a.event_log_no not in (select event_log_no from myevents)
srm2-# and exists (select b.event_log_no from myevents b
srm2(# where a.event_ref_log_no = b.event_log_no);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using indx_tuel_usn_ec_es on t_unit_event_log a (cost=31711.73..3597826539.34 rows=51479 width=145)
Index Cond: (event_status = 1)
Filter: ((event_ref_log_no IS NOT NULL) AND (subplan) AND (NOT (subplan)))
SubPlan
-> Materialize (cost=31711.73..42857.85 rows=830612 width=4)
-> Seq Scan on myevents (cost=0.00..28041.12 rows=830612 width=4)
-> Index Scan using indx1myevents on myevents b (cost=0.00..8.37 rows=1 width=4)
Index Cond: ($1 = event_log_no)
(8 rows)

srm2=# explain insert into myevents select * from t_unit_event_log a where
a.event_status = 1 and a.event_ref_log_no IS NOT NULL
and a.event_log_no not in (select event_log_no from myevents)
and a.event_ref_log_no in (select b.event_log_no from myevents b);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Merge IN Join (cost=40821629.90..3265326013.41 rows=39021 width=145)
Merge Cond: (a.event_ref_log_no = b.event_log_no)
-> Index Scan using indx_tuel_erln on t_unit_event_log a (cost=31711.73..8616020218.25 rows=102959 width=145)
Filter: ((event_ref_log_no IS NOT NULL) AND (event_status = 1) AND (NOT (subplan)))
SubPlan
-> Materialize (cost=31711.73..42857.85 rows=830612 width=4)
-> Seq Scan on myevents (cost=0.00..28041.12 rows=830612 width=4)
-> Materialize (cost=119646.12..130028.77 rows=830612 width=4)
-> Sort (cost=119646.12..121722.65 rows=830612 width=4)
Sort Key: b.event_log_no
-> Seq Scan on myevents b (cost=0.00..28041.12 rows=830612 width=4)
(11 rows)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: