Thursday, July 17, 2008

Re: [HACKERS] Load spikes on 8.1.11

Just an addition... the strace o/p with selects timing out just runs almost continuously, it doesn't seem to pause anywhere!

On Fri, Jul 18, 2008 at 9:16 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
Hi All,

    I have been perplexed by random load spikes on an 8.1.11 instance. many a times they are random, in the sense we cannot tie a particular scenario as the cause for it! But a few times we can see that when we are executing huge scripts, which include DDL as well as DML, the load on the box spikes to above 200. We see similar load spikes other times too when we are not running any such task on the DB.

    During these spikes, in the 'top' sessions we see the 'idle' PG processes consuming between 2 and 5 % CPU, and since the box has 8 CPUS (
2 sockets and each CPU is a quad core Intel Xeon processors) and somewhere around 200 Postgres processes, the load spikes to above 200; and it does this very sharply.

    We are running the scripts using psql -f, but we can see the load even while running the commands on by one!

    When there's no load, an strace session on an 'idle' PG process looks like:

[postgres@db1 data]$ strace -p 9375
Process 9375 attached - interrupt to quit
recvfrom(9,  <unfinished ...>
Process 9375 detached


    But under these heavy load onditions, an 'idle' PG process' strace looks like:

[postgres@db1 data]$ strace -p 22994
Process 22994 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 11000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 14000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 17000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 31000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 51000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 5000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 12000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 12000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 23000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 27000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 47000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 70000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 11000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 19000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 35000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 53000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 75000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 76000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 102000}) = 0 (Timeout)
Process 22994 detached


    So I guess there's something very wrong with the above 'select' calls.

    Can somebody please shed some light on this? Let me know what OS/hardware specs you need.

    Any help is greatly appreciated.

Thanks in advance,


--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

No comments: