Friday, August 1, 2008

Re: [PATCHES] pg_dump additional options for performance

tgl@sss.pgh.pa.us (Tom Lane) writes:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> I want to dump tables separately for performance reasons. There are
>> documented tests showing 100% gains using this method. There is no gain
>> adding this to pg_restore. There is a gain to be had - parallelising
>> index creation, but this patch doesn't provide parallelisation.
>
> Right, but the parallelization is going to happen sometime, and it is
> going to happen in the context of pg_restore. So I think it's pretty
> silly to argue that no one will ever want this feature to work in
> pg_restore.

"Never" is a long time, agreed.

> To extend the example I just gave to Stephen, I think a fairly probable
> scenario is where you only need to tweak some "before" object
> definitions, and then you could do
>
> pg_restore --schema-before-data whole.dump >before.sql
> edit before.sql
> psql -f before.sql target_db
> pg_restore --data-only --schema-after-data -d target_db whole.dump
>
> which (given a parallelizing pg_restore) would do all the time-consuming
> steps in a fully parallelized fashion.

Do we need to wait until a fully-parallelizing pg_restore is
implemented before adding this functionality to pg_dump?

The particular extension I'm interested in from pg_dump, here, is the
ability to dump multiple tables concurrently. I've got disk arrays
with enough I/O bandwidth that this form of parallelization does
provide a performance benefit.

The result of that will be that *many* files are generated, and I
don't imagine we want to change pg_restore to try to make it read from
multiple files concurrently.

Further, it's actually not obvious that we *necessarily* care about
parallelizing loading data. The thing that happens every day is
backups. I care rather a lot about optimizing that; we do backups
each and every day, and optimizations to that process will accrue
benefits each and every day.

In contrast, restoring databases does not take place every day. When
it happens, yes, there's considerable value to making *that* go as
quickly as possible, but I'm quite willing to consider optimizing that
to be separate from optimizing backups.

I daresay I haven't used pg_restore any time recently, either. Any
time we have thought about using it, we've concluded that the
perceivable benefits were actually more of a mirage.
--
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules of the Evil Overlord #145. "My dungeon cell decor will not
feature exposed pipes. While they add to the gloomy atmosphere, they
are good conductors of vibrations and a lot of prisoners know Morse
code." <http://www.eviloverlord.com/>

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

No comments: