Monday, June 16, 2008

Re: [ADMIN] block error, but can't pg_dump

Thanks for your reply.  I'm afraid my Linux/Postgresql isn't quite good enough to use your answer.

1) Yes, I was selecting the records by the primary key.

2) Much of the data in this table can be dumped.  It's helpful for reports but it would be fine to archive it or even zero it out if it would get this working again.  I'd love to save recent data somehow, though, and the pg_dump only dumps the oldest.

3) I'm using Red Hat Linux, but when I tried to install the rhdb it failed on make although I downloaded version 4.0.  I am using Postgresql 8.0.13 -- I know I should upgrade but I needed an older version for older software I was using and then even now I shouldn't upgrade until I can backup everything.

4) I don't quite know how to get the OID of my database (directory name) and the relfilenode of the table.  If I try and connect to pg_database (psql -d pg_database) when signed in as the postgresql user, I get:
psql: FATAL:  database "pg_database" does not exist
I've looked online but can't find a page to describe what to do. 

I've thought about using delete * where... from the database to delete the older records in case that fixes it, but select * where... for older records doesn't work, so I assume delete won't either?  I don't want to try this if it doesn't even get me anywhere.

Thanks so much for your help.




At 03:44 PM 6/13/2008 -0400, Tom Lane wrote:


NOW Web Sites Manager <webperson@now.org> writes:
> I have a table that reports one block error when I try to select
> everything, reindex, vacuum, etc. (invalid page header in block
> 413620 of relation "tablename")  I've read on the list that the thing
> to do is use pg_dump and then restore it.  However, pg_dump fails
> with the same error.

> I can't take down the whole server, let alone easily Postgresql, to
> debug this.  Any ideas as to how I could do this?  I wrote a Perl
> script to select the records one by one but it didn't find a bad
> record although maybe it's how I wrote the script (it just selected
> the records one by one.)

Hm, were you selecting the records by primary key?  (or some other way
that would have produced an index search?)  If so, the above suggests
that the corrupted block is one that happens not to contain any live
data, which would be a lucky break for you.

Personally I'd want to know what was going on before taking any drastic
measures, so I'd try to dump out the block with pg_filedump:
http://sources.redhat.com/rhdb/

However, if you're comfortable that your perl script was able to
retrieve everything you care about from the table, you could skip
that and go directly to zeroing out the block, which is the appropriate
recovery action when you don't care about the data in it.  The usual
way to do that is with dd:
        dd if=/dev/zero of=TARGETFILE bs=8k seek=413620 count=1
(Sorry, no idea how to do it if you're on Windows.)

The trick is to be sure you've got the right target file.  You need the
OID of your database (directory name) and the relfilenode (NOT OID)
of the table, which you get from pg_database and pg_class respectively.
Then the file is $PGDATA/base/DBOID/RELFILENODE.

Hm, wait a second, block 413620 would be past the 1-GB mark, well
past it in fact.  What you're actually gonna need is
        dd if=/dev/zero of=TARGETFILE.3 bs=8k seek=20404 count=1
ie the filename you want is RELFILENODE.3 not just RELFILENODE.

You might want to read the "Database Physical Storage" chapter of
TFM to be sure you follow all this before you start zapping data.

                          regards, tom lane


No comments: