> Dear all:
> We are currently considering using PostgreSQL to host a read only warehouse, we would like to get some experiences, best practices and performance metrics from the user community, following is the question list:
> 1. What's size of your database?
Varies. I've had reporting dbs in the low 100s of gigabytes.
> 2. What Operating System are you using?
I've generally worked with Linux. RHEL, Centos, or Ubuntu.
> 3. What level is your RAID array?
For transactional, ALWAYS RAID 10. For reporting sometimes RAID-5,
mostly RAID-10
The reporting server I built at my last company was a collection of
spare parts and ran a software RAID-10 over 4 150G sata drives. It
routinely outran the Oracle RAC cluster with 14 drives in RAID 6
sitting next to it doing reports on the same data.
> 4. How many cores and memory does your server have?
The reporting server from my last company had a single hyperthreaded
P4 and 4 Gig of ram.
Current transactional server runs on 8 opterons, with 32 Gigs of ram.
> 5. What about your performance of join operations?
Always been pretty good. Kind of a wide open question really. I'd
say PostgreSQL's query planner is usually very smart planning complex
queries. note that joins were never an issue, but I had to pay
attention to how I designed correlated subqueries and aggregate
queries.
> 6. What about your performance of load operations?
Pretty much dependent on the hardware you're on. I can replicate the
current ~15 Gig transactional db in about 15 or 20 minutes from one 8
core 16 drive machine to another.
> 7. How many concurrent readers of your database, and what's the average transfer rate, suppose all readers are doing one table scaning.
Concurrent but idle connections in production are around 600. Active
connections at a time are in the dozens. I can read at about 60 to 70
Megs a second for random access and around 350 to 400 Megs a second
for sequential reads.
> 8. Single instance or a cluster, what cluster software are you using if you have a cluster?
Two machines with one as slony master and the other as slony slave,
with the application doing weighted load balancing on reads between
the two.
The important thing about pgsql is how well it scales to work on
larger hardware.
If you throw enough drives on a quality RAID controller at it you can
get very good throughput. If you're looking at read only / read
mostly, then RAID5 or 6 might be a better choice than RAID-10. But
RAID 10 is my default choice unless testing shows RAID-5/6 can beat
it.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
No comments:
Post a Comment