Sunday, July 27, 2008

Re: [PATCHES] pg_dump additional options for performance

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Right, but the parallelization is going to happen sometime, and it is
> going to happen in the context of pg_restore. So I think it's pretty
> silly to argue that no one will ever want this feature to work in
> pg_restore.

I think you've about convinced me on this, and it annoys me. ;) Worse
is that it sounds like this might cause the options to not make it in
for 8.4, which would be quite frustrating.

> To extend the example I just gave to Stephen, I think a fairly probable
> scenario is where you only need to tweak some "before" object
> definitions, and then you could do
>
> pg_restore --schema-before-data whole.dump >before.sql
> edit before.sql
> psql -f before.sql target_db
> pg_restore --data-only --schema-after-data -d target_db whole.dump
>
> which (given a parallelizing pg_restore) would do all the time-consuming
> steps in a fully parallelized fashion.

Alright, this has been mulling around in the back of my head a bit and
has now finally surfaced- I like having the whole dump contained in a
single file, but I hate having what ends up being "out-dated" or "wrong"
or "not what was loaded" in the dump file. Doesn't seem likely to be
possible, but it'd be neat to be able to modify objects in the dump
file.

Also, something which often happens to me is that I need to change the
search_path or the role at the top of a .sql from pg_dump before
restoring it. Seems like using the custom format would make that
difficult without some pipe/cat/sed magic. Parallelization would make
using that kind of magic more difficult too, I would guess. Might be
something to think about.

Thanks,

Stephen

No comments: