Thursday, July 3, 2008

Re: [HACKERS] the un-vacuumable table

"Andrew Hammond" <andrew.george.hammond@gmail.com> writes:
> On Thu, Jul 3, 2008 at 2:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The whole thing is pretty mystifying, especially the ENOSPC write
>> failure on what seems like it couldn't have been a full disk.

> Yes, I've passed along the task of explaining why PG thought the disk
> was full to the sysadmin responsible for the box. I'll post the answer
> here, when and if we have one.

I just noticed something even more mystifying: you said that the ENOSPC
error occurred once a day during vacuuming. That doesn't make any
sense, because a write error would leave the shared buffer still marked
dirty, and so the next checkpoint would try to write it again. If
there's a persistent write error on a particular block, you should see
it being complained of at least once per checkpoint interval.

If you didn't see that, it suggests that the ENOSPC was transient,
which isn't unreasonable --- but why would it recur for the exact
same block each night?

Have you looked into the machine's kernel log to see if there is any
evidence of low-level distress (hardware or filesystem level)? I'm
wondering if ENOSPC is being reported because it is the closest
available errno code, but the real problem is something different than
the error message text suggests. Other than the errno the symptoms
all look quite a bit like a bad-sector problem ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: