Friday, September 12, 2008

[PERFORM] Postgres Performance on CPU limited Platforms

I'm trying to optimize postgres performance on a headless solid state
hardware platform (no fans or disks). I have the database stored on a
USB 2.0 flash drive (hdparm benchmarks reads at 10 MB/s). Performance is
limited by the 533Mhz CPU.

Hardware:
IXP425 XScale (big endian) 533Mhz 64MB RAM
USB 2.0 Flash Drive

Software:
Linux 2.6.21.4
postgres 8.2.5

I created a fresh database using initdb, then added one table.

Here is the create table:
CREATE TABLE archivetbl
(
"DateTime" timestamp without time zone,
"StationNum" smallint,
"DeviceDateTime" timestamp without time zone,
"DeviceNum" smallint,
"Tagname" character(64),
"Value" double precision,
"Online" boolean
)
WITH (OIDS=FALSE);
ALTER TABLE archivetbl OWNER TO novatech;

I've attached my postgresql.conf

I populated the table with 38098 rows.

I'm doing this simple query:
select * from archivetbl;

It takes 79 seconds to complete the query (when postgres is compiled
with -O2). I'm running the query from pgadmin3 over TCP/IP.

top shows CPU usage is at 100% with 95% being in userspace. oprofile
shows memset is using 58% of the CPU cycles!

CPU: ARM/XScale PMU2, speed 0 MHz (estimated)
Counted CPU_CYCLES events (clock cycles counter) with a unit mask of
0x00 (No unit mask) count 100000
samples % app name symbol name
288445 57.9263 libc-2.5.so memset
33273 6.6820 vmlinux default_idle
27910 5.6050 vmlinux cpu_idle
12611 2.5326 vmlinux schedule
8803 1.7678 libc-2.5.so __printf_fp
7448 1.4957 postgres dopr
6404 1.2861 libc-2.5.so vfprintf
6398 1.2849 oprofiled (no symbols)
4992 1.0025 postgres __udivdi3
4818 0.9676 vmlinux run_timer_softirq


I was having trouble getting oprofile to give a back trace for memset
(probably because my libc is optimized). So I redefined MemSet to call this:
void * gmm_memset(void *s, int c, size_t n)
{
int i=0;
unsigned char * p = (unsigned char *)s;
for(i=0; i<n; i++)
{
p[i]=0;
}
return s;
}

Here are the oprofile results for the same select query.

CPU: ARM/XScale PMU2, speed 0 MHz (estimated)
Counted CPU_CYCLES events (clock cycles counter) with a unit mask of
0x00 (No unit mask) count 100000
samples % image name app name
symbol name
-------------------------------------------------------------------------------
1 5.2e-04 postgres postgres
LockAcquire
1 5.2e-04 postgres postgres
set_ps_display
20 0.0103 postgres postgres
pg_vsprintf
116695 60.2947 postgres postgres dopr
116717 60.3061 postgres postgres
gmm_memset
116717 60.3061 postgres postgres
gmm_memset [self]
-------------------------------------------------------------------------------
20304 10.4908 oprofiled oprofiled (no
symbols)
20304 10.4908 oprofiled oprofiled
(no symbols) [self]
-------------------------------------------------------------------------------
4587 2.3700 vmlinux vmlinux
rest_init
6627 3.4241 vmlinux vmlinux
cpu_idle
11214 5.7941 vmlinux vmlinux
default_idle
11214 5.7941 vmlinux vmlinux
default_idle [self]
-------------------------------------------------------------------------------
16151 8.3450 vmlinux vmlinux
rest_init
9524 4.9209 vmlinux vmlinux cpu_idle
9524 4.9209 vmlinux vmlinux
cpu_idle [self]
6627 3.4241 vmlinux vmlinux
default_idle
-------------------------------------------------------------------------------
5111 2.6408 oprofile oprofile (no
symbols)
5111 2.6408 oprofile oprofile
(no symbols) [self]

oprofile shows dopr is making most of the calls to memset.

Are these results typical? If memset is indeed using over 50% of the CPU
something seems seriously wrong.

Should I be expecting more performance from this hardware than what I'm
getting in these tests?

Regards,
George McCollister

No comments: