Tuesday, July 22, 2008

[HACKERS] Postgres-R: tuple serialization

Hi,

yesterday, I promised to outline the requirements of Postgres-R for
tuple serialization, which we have been talking about before. There are
basically three types of how to serialize tuple changes, depending on
whether they originate from an INSERT, UPDATE or DELETE. For updates and
deletes, it saves the old pkey as well as the origin (a global
transaction id) of the tuple (required for consistent serialization on
remote nodes). For inserts and updates, all added or changed attributes
need to be serialized as well.

pkey+origin changes
INSERT - x
UPDATE x x
DELETE x -

Note, that the pkey attributes may never be null, so an isnull bit field
can be skipped for those attributes. For the insert case, all attributes
(including primary key attributes) are serialized. Updates require an
additional bit field (well, I'm using chars ATM) to store which
attributes have changed. Only those should be transferred.

I'm tempted to unify that, so that inserts are serialized as the
difference against the default vaules or NULL. That would make things
easier for Postgres-R. However, how about other uses of such a fast
tuple applicator? Does such a use case exist at all? I mean, for
parallelizing COPY FROM STDIN, one certainly doesn't want to serialize
all input tuples into that format before feeding multiple helper
backends. Instead, I'd recommend letting the helper backends do the
parsing and therefore parallelize that as well.

For other features, like parallel pg_dump or even parallel query
execution, this tuple serialization code doesn't help much, IMO. So I'm
thinking that optimizing it for Postgres-R's internal use is the best
way to go.

Comments? Opinions?

Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: