Tuesday, August 5, 2008

Re: [HACKERS] Automatic Client Failover

Le mardi 05 août 2008, Tom Lane a écrit :
> Huh? The problem case is that the primary server goes down, which would
> certainly mean that a pgbouncer instance on the same machine goes with
> it. So it seems to me that integrating pgbouncer is 100% backwards.

With all due respect, it seems to me you're missing an important piece of the
scheme here: I certainly failed to explain correctly. Of course, I'm not sure
(by and large) that detailing what I have in mind will answer your concerns,
but still...

What I have in mind is having the pgbouncer listening process both at master
and slave sites. So your clients can already connect to slave for normal
operations, and the listener process simply connects them to the master,
transparently.
When we later provider RO slave, some queries could be processed locally
instead of getting sent to the master.
The point being that the client does not have to care itself whether it's
connecting to a master or a slave, -core knows what it can handle for the
client and handles it (proxying the connection).

Now, that does not solve the client side automatic failover per-se, it's
another way to think about it:
- both master & slave accept connection in any mode
- master & slave are able to "speak" to each other (life link)
- when master knows it's crashing (elog(FATAL)), it can say so to the slave
- when said so, slave can switch to master

It obviously only catches some errors on master, the ones we're able to log
about. So it does nothing on its own for allowing HA in case of master crash.
But...

> Failover that actually works is not something we can provide with
> trivial changes to Postgres. It's really a major project in its
> own right: you need heartbeat detection, STONITH capability,
> IP address redirection, etc. I think we should be recommending
> external failover-management project(s) instead of offering a
> half-baked home-grown solution. Searching freshmeat for "failover"
> finds plenty of potential candidates, but not having used any of
> them I'm not sure which are worth closer investigation.

We have worked here with heartbeat, and automating failover is hard. Not for
technical reasons only, also because:
- current PostgreSQL offers no sync replication, switching means trading or
losing the D in ACID,
- you do not want to lose any commited data.

If 8.4 resolve this, failover implementation will be a lot easier.

What I see my proposal fit is the ability to handle a part of the smartness
in -core directly, so the hard part of the STONITH/failover/switchback could
be implemented in cooperation with -core, not playing tricks against it.

For example, switching back when master gets back online would only means for
the master to tell the slave to now redirect the queries to him as soon as
it's ready --- which still is the hard part, sync back data.

Having clients able to blindly connect to master or any slave and having the
current cluster topology smartness into -core would certainly help here, even
if not fullfilling all HA goals.

Of course, in the case of master hard crash, we still have to get sure it
won't restart on its own, and we have to have an external way to get a chosen
slave become the master.

I'm even envisioning than -core could help STONITH projects with having sth
like the recovery.conf file for the master to restart in not-up-to-date slave
mode. Whether we implement resyncing to the new master in -core or from
external scripts is another concern, but certainly -core could help here
(even if not in 8.4, of course).

I'm still thinking that this proposal has a place in the scheme of an
integrated HA solution and offers interresting bits.

Regards,
--
dim

No comments: