Wednesday, September 24, 2008

Re: [HACKERS] parallel pg_restore

Dimitri Fontaine wrote:
> Le mercredi 24 septembre 2008, Andrew Dunstan a écrit :
>> No. The proposal will perform exactly the same set of steps as
>> single-threaded pg_restore, but in parallel. The individual steps won't
>> be broken up.
> Ok, good for a solid trustworthy parallelism restore. Which is exactly what we
> want. Just out of curiosity, do you plan to use Postgres-R helper backends
> infrastructure?

The is purely a patch to pg_restore. No backend changes at all (and if I
did it would not use anything that isn't in core anyway).
>> Quite apart from anything else, parallel data loading of individual
>> tables will defeat clustering, as well as making it impossible to avoid
>> WAL logging of the load (which I have made provision for).
> Depends whether the different workers are able to work from the same
> transaction or not, I imagine. Some work has been done to allow multiple
> backends to be working in the exact same transaction (Simon's snapclone and
> Postgres-R helper backend infrastructure), so one of them could TRUNCATE the
> table and give a go signal to workers to fill the table. In the same
> transaction.
> Ok, I need to wake up now... :)

Again, I am not doing anything on the backend. I am following Tom's
original suggestion of simply having pg_restore run steps in parallel,
with no backend changes.

Also, you ignored the point about clustered data. Maybe that doesn't
matter to some people, but it does to others. This is designed to
provide the same result as a single threaded pg_restore. Splitting data
will break that.

>> The fact that custom archives are compressed by default would in fact
>> make parallel loading of individual tables' data difficult with the
>> present format. We'd have to do something like expanding it on the
>> client (which might not even have enough disk space) and then split it
>> before loading it to the server. That's pretty yucky. Alternatively,
>> each loader thread would need to start decompressing the data from the
>> start and thow away data until it got to the point it wanted to start
>> restoring from. Also pretty yucky.
> Another alternative is the round-robin reader implemented in pgloader, where
> all the archive reading is done by a single worker, which then split what it
> read to any number of coworkers, filling next queue(s) while previous one(s)
> are busy COPYing to the server.
>> Far better would be to provide for multiple data members in the archive
>> and teach pg_dump to split large tables as it writes the archive. Then
>> pg_restore would need comparatively little adjustment.
> Well, that's another possibility, but I tend to prefer having the parallelism
> mecanics into the restore side of things. It may be only an illusion, but
> this way I have far more trust into my backups.

Having pg_dump do the split would mean you get it for free, pretty much.
Rejecting that for a solution that could well be a bottleneck at restore
time would require lots more than just a feeling. I don't see how it
would give you any less reason to trust your backups.
>> Also, of course, you can split tables yourself by partitioning them.
>> That would buy you parallel data load with what I am doing now, with no
>> extra work.
> And that's excellent :)
>> In any case, data loading is very far from being the only problem. One
>> of my clients has long running restores where the data load takes about
>> 20% or so of the time - the rest is in index creation and the like. No
>> amount of table splitting will make a huge difference to them, but
>> parallel processing will.
> Oh yes, I'm running into this too (not on the same level but still).
> Parallel seqscan should help creating indexes in parallel without having the
> disks going crazy for read - write - read - write etc sequences, and posix
> advices should help a lot here too.
> Does the dependancy tracker in pg_restore allows to consider FK creation are
> dependant on matching PK being already there?

I believe so. If not, that's a bug and we should fix it IMNSHO.

>> As against that, if your problem is in loading
>> one huge table, this won't help you much. However, this is not a pattern
>> I see much - most of my clients seem to have several large tables plus a
>> boatload of indexes. They will benefit a lot.
> The use case given by Greg Smith at the time was loading a multi terabyte
> table on a raid array with a lot of spindles. It then become impossible for a
> single CPU to take full profit of the available write bandwith. No idea how
> common this situation is in the field, though.

I still think the multiple data members of the archive approach would be
best here. One that allowed you to tell pg_dump to split every nn rows,
or every nn megabytes. Quite apart from any parallelism issues, that
could help enormously when there is a data problem as happens from time
to time, and can get quite annoying if it's in the middle of a humungous
data load.



Sent via pgsql-hackers mailing list (
To make changes to your subscription:

No comments: