Monday, July 28, 2008

Re: [PERFORM] how does pg handle concurrent queries and same queries

Hi,

here is what the original query was which was obviously nonsense :
EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, s.elofordulas FROM letoltes
INNER JOIN (select letoltes.cid, count(letoltes.cid) AS elofordulas FROM
letoltes GROUP BY cid) s ON s.cid=letoltes.cid ORDER BY s.elofordulas DESC
LIMIT 5;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------
Limit (cost=73945.35..73945.65 rows=5 width=12) (actual
time=4191.396..4351.966 rows=5 loops=1)
-> Unique (cost=73945.35..77427.99 rows=58800 width=12) (actual
time=4191.390..4351.956 rows=5 loops=1)
-> Sort (cost=73945.35..75106.23 rows=464351 width=12) (actual
time=4191.386..4283.545 rows=175944 loops=1)
Sort Key: s.elofordulas, letoltes.cid
-> Merge Join (cost=9257.99..30238.65 rows=464351 width=12)
(actual time=652.535..2920.304 rows=464351 loops=1)
Merge Cond: ("outer".cid = "inner".cid)
-> Index Scan using idx_letoltes_cid on letoltes
(cost=0.00..12854.51 rows=464351 width=4) (actual time=0.084..1270.588
rows=464351 loops=1)
-> Sort (cost=9257.99..9258.73 rows=294 width=12)
(actual time=652.434..810.941 rows=464176 loops=1)
Sort Key: s.cid
-> Subquery Scan s (cost=9242.26..9245.94
rows=294 width=12) (actual time=651.343..652.028 rows=373 loops=1)
-> HashAggregate (cost=9242.26..9243.00
rows=294 width=4) (actual time=651.339..651.661 rows=373 loops=1)
-> Seq Scan on letoltes
(cost=0.00..6920.51 rows=464351 width=4) (actual time=0.014..307.469
rows=464351 loops=1)
Total runtime: 4708.434 ms
(13 sor)

However after fixing the query this is 1/4 th of the time but still blocks
the site :


EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, count(letoltes.cid) AS
elofordulas FROM letoltes GROUP BY cid ORDER BY elofordulas DESC LIMIT 5;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------
Limit (cost=9255.05..9255.09 rows=5 width=4) (actual time=604.734..604.743
rows=5 loops=1)
-> Unique (cost=9255.05..9257.26 rows=294 width=4) (actual
time=604.732..604.737 rows=5 loops=1)
-> Sort (cost=9255.05..9255.79 rows=294 width=4) (actual
time=604.730..604.732 rows=5 loops=1)
Sort Key: count(cid), cid
-> HashAggregate (cost=9242.26..9243.00 rows=294 width=4)
(actual time=604.109..604.417 rows=373 loops=1)
-> Seq Scan on letoltes (cost=0.00..6920.51
rows=464351 width=4) (actual time=0.022..281.413 rows=464351 loops=1)
Total runtime: 604.811 ms


here is the table :
\d letoltes
TĂĄbla "public.letoltes"
Oszlop | TĂ­pus | MĂłdosĂ­tĂł
--------+---------+------------------------------------------------
id | integer | not null default nextval('letoltes_seq'::text)
cid | integer |
Indexes:
"idx_letoltes_cid" btree (cid)
"idx_letoltes_id" btree (id)

select count(1) from letoltes;
count
--------
464351


VACUM ANALYZE runs overnight every day.

thanks,
Gabor

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Claus Guttesen
Sent: Monday, July 28, 2008 8:56 AM
To: Faludi Gábor
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] how does pg handle concurrent queries and same
queries

> I have taken over the maintenance of a server farm , recently. 2 webserver
> on db server. They are quite powerful 2 processor xeon w/ 6Gig of ram .
>
> Couple of days ago we had a serious performance hit and the db server (pg.
> v7.4) was overloaded w/ something in a way that operating system was
almost
> not able to respond or in cases it did not.
>
> After some analysis i suspect that there is a query that takes up to 1
> second and that is the cause. Upon each page loading this query fires and
> takes the one second and blocks the page to load completly . The load was
> roughly ~300 connections in one minute .
>
> So here are my questions :
>
> . Why does the second and the later queries take the whole on
second
> if the dataset is the same . Shouldn't PG realise that the query is the
same
> so i give the user the same resultset ?
>
> . How do I know if one query blocks the other ?
>
> . Is there a way to log the long running queries in 7.4 ? If not
is
> it available in any newer version ?

Can you post the queries? Can you provide an 'analyze explain'? Do you
perform a 'vacuum analyze' on a regular basis?

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.5.6/1576 - Release Date: 2008.07.27.
16:16


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

No comments: