Friday, August 29, 2008

Re: [PERFORM] select on 22 GB table causes "An I/O error occured while sending to the backend." exception

david@lang.hm escribió:
> On Thu, 28 Aug 2008, Alvaro Herrera wrote:
>
>> david@lang.hm escribi?:
>>> On Thu, 28 Aug 2008, Scott Marlowe wrote:
>>
>>>> scenario 1: There's a postmaster, it owns all the child processes.
>>>> It gets killed. The Postmaster gets restarted. Since there isn't one
>>>
>>> when the postmaster gets killed doesn't that kill all it's children as
>>> well?
>>
>> Of course not. The postmaster gets a SIGKILL, which is instant death.
>> There's no way to signal the children. If they were killed too then
>> this wouldn't be much of a problem.
>
> I'm not saying that it would signal it's children, I thought that the OS
> killed children (unless steps were taken to allow them to re-parent)

Oh, you were mistaken then.

>>> well, if you aren't going through the postmaster, what process is
>>> recieving network messages? it can't be a group of processes, only one
>>> can be listening to a socket at one time.
>>
>> Huh? Each backend has its own socket.
>
> we must be talking about different things. I'm talking about the socket
> that would be used for clients to talk to postgres, this is either a TCP
> socket or a unix socket. in either case only one process can listen on
> it.

Obviously only one process (the postmaster) can call listen() on a given
TCP address/port. Once connected, the socket is passed to the
backend, and the postmaster is no longer involved in the communication
between backend and client. Each backend has its own socket. If the
postmaster dies, the established communication is still alive.


>>> and if the postmaster isn't needed for the child processes to write to
>>> the datastore, how are multiple child processes prevented from writing to
>>> the datastore normally? and why doesn't that mechanism continue to work?
>>
>> They use locks. Those locks are implemented using shared memory. If a
>> new postmaster starts, it gets a new shared memory, and a new set of
>> locks, that do not conflict with the ones already held by the first gang
>> of backends. This is what causes the corruption.
>
> so the new postmaster needs to detect that there is a shared memory
> segment out that used by backends for this database.

> this doesn't sound that hard,

You're welcome to suggest actual improvements to our interlocking
system, after you've read the current code and understood its rationale.


>> Any other signal gives it the chance to signal the children before
>> dying.
>
> are you sure that it's not going to die from a memory allocation error?
> or any other similar type of error without _always_ killing the children?

I am sure. There are no memory allocations in that code. It is
carefully written with that one purpose.

There may be bugs, but that's another matter. This code was written
eons ago and has proven very healthy.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

No comments: