Wednesday, July 30, 2008

Re: [BUGS] Segfault manifesting in libm from cost_sort

ECC memory, RAID 10 w/ adaptec 3405 hardware controller. Period between crashes ranged from about 1min-5mins. Just switched to a new box so the problem is "gone". In the original email I meant ubuntu 6.06.

On Wed, Jul 30, 2008 at 12:06 AM, John R Pierce <pierce@hogranch.com> wrote:
Andrew Badr wrote:
Our pg keeps going into recovery mode after segfaulting. This is a compiled 8.3.3 on Ubuntu 6.04 with Slony.

Some ideas from IRC:
RhodiumToad: 1) probably least likely, something corrupt in the math libs; the fact that it's not reproducible makes this improbable
RhodiumToad: 2) more likely: a register or memory stomp in a signal handler, which could be the result of an OS bug or a pg miscompile
RhodiumToad: 3) slightly less likely: a memory stomp somewhere else in pg that happens to be clobbering something in the math library

does this server have ECC memory?   if not

Pierce:  4) flakey memory


does this server have an 'enterprise' grade disk system (eg, SAS, SCSI, Fiberchannel, with a decent qualilty RAID controller)?    if its a desktop ATA/SATA disk...

Pierce: 5) flakey disk drive or channel



No comments: