Friday, June 27, 2008

Re: [PERFORM] Federated Postgresql architecture ?

josh@agliodbs.com (Josh Berkus) writes:
> Jonah,
>
>> Hmm, I didn't think the Skype tools could really provide federated
>> database functionality without a good amount of custom work. Or, am I
>> mistaken?
>
> Sure, what do you think pl/proxy is for?

Ah, but the thing is, it changes the model from a relational one,
where you can have fairly arbitrary "where clauses," to one where
parameterization of queries must be predetermined.

The "hard part" of federated database functionality at this point is
the [parenthesized portion] of...

select * from table@node [where criterion = x];

What we'd like to be able to do is to ascertain that [where criterion
= x] portion, and run it on the remote DBMS, so that only the relevant
tuples would come back.

Consider...

What if table@node is a remote table with 200 million tuples, and
[where criterion = x] restricts the result set to 200 of those.

If you *cannot* push the "where clause" down to the remote node, then
you're stuck with pulling all 200 million tuples, and filtering out,
on the "local" node, the 200 tuples that need to be kept.

To do better, with pl/proxy, requires having a predetermined function
that would do that filtering, and if it's missing, you're stuck
pulling 200M tuples, and throwing out nearly all of them.

In contrast, with the work David Fetter's looking at, the [where
criterion = x] clause would get pushed to the node which the data is
being drawn from, and so the query, when running on "table@node,"
could use indices, and return only the 200 tuples that are of
interest.

It's a really big win, if it works.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/lisp.html
"The avalanche has started, it is too late for the pebbles to vote"
-- Kosh, Vorlon Ambassador to Babylon 5

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

No comments: