Wednesday, July 23, 2008

[HACKERS] Postgres-R: internal messaging

Hi,

As you certainly know by now, Postgres-R introduces an additional
manager process. That one is forked from the postmaster, so are all
backends, no matter if they are processing local or remote transactions.
That led to a communication problem, which has originally (i.e. around
Postgres-R for 6.4) been solved by using unix pipes. I didn't like that
approach for various reasons: first, AFAIK there are portability issues,
second it eats file descriptors and third it involves copying around the
messages several times. As the replication manager needs to talk to the
backends, but they both need to be forked from the postmaster, pipes
would also have to go through the postmaster process.

Trying to be as portable as Postgres itself and still wanting an
efficient messaging system, I came up with that imessages stuff, which
I've already posted to -patches before [1]. It uses shared memory to
store and 'transfer' the messages and signals to notify other processes
(the so far unused SIGUSR2, IIRC). Of course this implies having a hard
limit on the total size of messages waiting to be delivered, due to the
fixed size of the shared memory area.

Besides the communication between the replication manager and the
backends, which is currently done by using these imessages, the
replication manager also needs to communicate with the postmaster: it
needs to be able to request new helper backends and it wants to be
notified upon termination (or crash) of such a helper backend (and other
backends as well...). I'm currently doing this with imessages as well,
which violates the rule that the postmaster may not to touch shared
memory. I didn't look into ripping that out, yet. I'm not sure it can be
done with the existing signaling of the postmaster.

Let's have a simple example: consider a local transaction which changes
some tuples. Those are being collected into a change set, which gets
written to the shared memory area as an imessage for the replication
manager. The backend then also signals the manager, which then awakes
from its select(), checks its imessages queue and processes the message,
delivering it to the GCS. It then removes the imessage from the shared
memory area again.

My initial design features only a single doubly linked list as the
message queue, holding all messages for all processes. An imessages lock
blocks concurrent writing acces. That's still what's in there, but I
realize that's not enough. Each process should better have its own
queue, and the single lock needs to vanish to avoid contention on that
lock. However, that would require dynamically allocatable shared memory...

As another side node: I've had to write methods similar to those in
libpq, which serialize and deserialize integers or strings. The libpq
functions were not appropriate because they cannot write shared memory,
instead they are designed to flush to a socket, if I understand
correctly. Maybe, these could be extended or modified to be usable there
as well? I've been hesitating and rather implemented separate methods in
src/backed/storage/ipc/buffer.c.

Comments?

Regards

Markus Wanner

[1]: last time I published IMessage stuff on -patches, WIP:
http://archives.postgresql.org/pgsql-patches/2007-01/msg00578.php


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: