Sunday, September 21, 2008

Re: [HACKERS] parallel pg_restore

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> I am working on getting parallel pg_restore working. I'm currently
>> getting all the scaffolding working, and hope to have a naive prototype
>> posted within about a week.
>>
>
>
>> The major question is how to choose the restoration order so as to
>> maximize efficiency both on the server and in reading the archive.
>>
>
> One of the first software design principles I ever learned was to
> separate policy from mechanism. ISTM in this first cut you ought to
> concentrate on mechanism and let the policy just be something dumb
> (but coded separately from the infrastructure). We can refine it after
> that.
>


Indeed, that's exactly what I'm doing. However, given that time for the
8.4 window is short, I thought it would be sensible to get people
thinking about what the policy might be, while I get on with the mechanism.

>
>> Another question is what we should do if the user supplies an explicit
>> order with --use-list. I'm inclined to say we should stick strictly with
>> the supplied order. Or maybe that should be an option.
>>
>
> Hmm. I think --use-list is used more for selecting a subset of items
> to restore than for forcing a nondefault restore order. Forcing the
> order used to be a major purpose, but that was years ago before we
> had the dependency-driven-restore-order code working. So I'd vote that
> the default behavior is to still allow parallel restore when this option
> is used, and we should provide an orthogonal option that disables use of
> parallel restore.
>
> You'd really want the latter anyway for some cases, ie, when you don't
> want the restore trying to hog the machine. Maybe the right form for
> the extra option is just a limit on how many connections to use. Set it
> to one to force the exact restore order, and to other values to throttle
> how much of the machine the restore tries to eat.
>

My intention is to have single-thread restore remain the default, at
least for this go round, and have the user be able to choose
--multi-thread=nn to specify the number of concurrent connections to use.

> One problem here though is that you'd need to be sure you behave sanely
> when there is a dependency chain passing through an object that's not to
> be restored. The ordering of the rest of the chain still ought to honor
> the dependencies I think.
>
>
>

Right. I think we'd need to fake doing a full restore and omit actually
restoring items not on the passed in list. That should be simple enough.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: