Monday, August 18, 2008

Re: [HACKERS] IN vs EXISTS equivalence

Hello

I did some fast test on pagila database.

8.4
postgres=# explain analyze select * from film f where exists (select
film_id from film_actor where f.film_id = film_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=117.01..195.42 rows=966 width=390) (actual
time=36.011..43.314 rows=997 loops=1)
Hash Cond: (f.film_id = film_actor.film_id)
-> Seq Scan on film f (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.027..1.971 rows=1000 loops=1)
-> Hash (cost=104.94..104.94 rows=966 width=2) (actual
time=35.886..35.886 rows=997 loops=1)
-> HashAggregate (cost=95.28..104.94 rows=966 width=2)
(actual time=32.650..34.139 rows=997 loops=1)
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462
width=2) (actual time=0.081..14.232 rows=5462 loops=1)
Total runtime: 45.373 ms
(7 rows)

8.3
postgres=# explain select * from film f where exists (select film_id
from film_actor where f.film_id = film_id);
QUERY PLAN
------------------------------------------------------------------------------------------
Seq Scan on film f (cost=0.00..4789.34 rows=500 width=390)
Filter: (subplan)
SubPlan
-> Index Scan using idx_fk_film_id on film_actor
(cost=0.00..28.35 rows=6 width=2)
Index Cond: ($0 = film_id)
(5 rows)

postgres=# explain analyze select * from film f where not exists
(select film_id from film_actor where f.film_id = film_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Anti Join (cost=149.90..240.24 rows=34 width=390) (actual
time=25.473..28.169 rows=3 loops=1)
Hash Cond: (f.film_id = film_actor.film_id)
-> Seq Scan on film f (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.027..1.898 rows=1000 loops=1)
-> Hash (cost=81.62..81.62 rows=5462 width=2) (actual
time=24.398..24.398 rows=5462 loops=1)
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462
width=2) (actual time=0.035..12.400 rows=5462 loops=1)
Total runtime: 28.866 ms

postgres=# explain analyze select * from film f where not exists
(select film_id from film_actor where f.film_id = film_id);
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on film f (cost=0.00..4789.34 rows=500 width=390) (actual
time=5.874..22.654 rows=3 loops=1)
Filter: (NOT (subplan))
SubPlan
-> Index Scan using idx_fk_film_id on film_actor
(cost=0.00..28.35 rows=6 width=2) (actual time=0.016..0.016 rows=1
loops=1000)
Index Cond: ($0 = film_id)
Total runtime: 22.835 ms
(6 rows)

postgres=# explain analyze select * from film f where film_id in
(select film_id from film_actor);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=117.01..195.42 rows=966 width=390) (actual
time=43.151..53.688 rows=997 loops=1)
Hash Cond: (f.film_id = film_actor.film_id)
-> Seq Scan on film f (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.021..6.765 rows=1000 loops=1)
-> Hash (cost=104.94..104.94 rows=966 width=2) (actual
time=43.091..43.091 rows=997 loops=1)
-> HashAggregate (cost=95.28..104.94 rows=966 width=2)
(actual time=34.754..36.275 rows=997 loops=1)
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462
width=2) (actual time=0.024..15.746 rows=5462 loops=1)
Total runtime: 55.291 ms

postgres=# explain analyze select * from film f where film_id in
(select film_id from film_actor);
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop IN Join (cost=0.00..175.25 rows=986 width=390) (actual
time=0.090..14.272 rows=997 loops=1)
-> Seq Scan on film f (cost=0.00..65.00 rows=1000 width=390)
(actual time=0.014..1.877 rows=1000 loops=1)
-> Index Scan using idx_fk_film_id on film_actor (cost=0.00..0.54
rows=6 width=2) (actual time=0.007..0.007 rows=1 loops=1000)
Index Cond: (film_actor.film_id = f.film_id)
Total runtime: 15.902 ms
(5 rows)

8.4
postgres=# explain analyze select * from film f where film_id not in
(select film_id from film_actor);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on film f (cost=95.28..162.78 rows=500 width=390) (actual
time=24.324..26.015 rows=3 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462 width=2)
(actual time=0.026..12.074 rows=5462 loops=1)
Total runtime: 26.201 ms
(5 rows)

8.3
postgres=# explain analyze select * from film f where film_id not in
(select film_id from film_actor);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on film f (cost=95.28..162.78 rows=500 width=390) (actual
time=29.713..30.456 rows=3 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on film_actor (cost=0.00..81.62 rows=5462 width=2)
(actual time=0.051..13.649 rows=5462 loops=1)
Total runtime: 30.644 ms
(5 rows)


8.4 is 10% faster than 8.3 on small table < 1000 rows. 8.4 has much
better prediction.

Regards
Pavel Stehule

2008/8/17 Tom Lane <tgl@sss.pgh.pa.us>:
> If you're still interested in testing CVS HEAD's handling of EXISTS,
> I've about finished what I wanted to do with it.
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: