Monday, July 7, 2008

[HACKERS] 8.1 index corruption woes

#!/usr/bin/perl

use warnings;
use strict;

my $PG_FILEDUMP = "pg_filedump-8.1";

process_table();
exit();

sub process_table {
my ($table, @indexes) = read_toc();

foreach my $elem ($table, @indexes) {
die "file $elem does not exist" unless -f "$elem";
if (!-f "$elem.dump") {
print STDERR "generating pg_filedump for $elem\n";
`$PG_FILEDUMP -i $elem > $elem.dump`;
}
}

print STDERR "loading unused line pointers for table $table\n";
my $unused = get_heap_unused($table);

foreach my $index (@indexes) {
print STDERR "processing index $index\n";
process_index($unused, $index);
}
}

# Reads a "toc" file, which is a description of a table and its indexes. A
# table line is "table: xxx" where xxx is the relfilenode of the table, and an
# index line is "index: xxx" where xxx is the relfilenode of the index.
sub read_toc {
my $table;
my @indexes;

print STDERR "now expecting a TOC in stdin...\n";
while (<>) {
if (/^table: ([0-9]+)$/) {
$table = $1;
}
if (/^index: ([0-9]+)$/) {
push @indexes, $1;
}
}
print STDERR "finished reading the TOC for table $table\n";
print STDERR ("indexes: ", join(", ", @indexes), "\n");
return $table, @indexes;
}

# Reads a pg_filedump -i report for the given table, and fetches into a hash
# all the line pointers that are marked "unused". Returns the hash so built.
sub get_heap_unused {
my $table = shift;
my $curblock;
my $unused = {};

my $infile = "$table.dump";
open IN, "<", "$infile" or die "can't open $infile: $!";
while (<IN>) {
if (/^Block\s+([0-9]+) /) {
$curblock = $1;
}

if (/^ Item\s+([0-9]+) -- Length:.*Flags: 0x00/) {
push @{$unused->{$curblock}}, $1;
}
}

return $unused;
}

# Given an index and a hash built by get_heap_unused, reads the pg_filedump -i
# report for that index and prints a list of all index pointers that point to
# any element in the hash.
sub process_index {
my $unused = shift;
my $index = shift;

my $curblock;
my $special = 0;
my $isleaf = 0;
my $collect = 0;
my @lines = ();

my $infile = "$index.dump";

open IN, "<", "$infile" or die "can't open $infile: $!";
while (<IN>) {
if (/^Block\s+([0-9]+) /) {
$curblock = $1;
}
if (/^<Data> --/) {
$collect = 1;
next;
}

if (/^<Special Section> -----$/) {
$special = 1;
next;
}

if ($collect) {
push @lines, $_;
}

if ($special) {
if (/^ Flags.*LEAF/) {
$isleaf = 1;
}
}

if (/^$/) {
if ($special) {
if ($isleaf) {
report_broken_block($unused,
$index, $curblock, @lines);
}
$isleaf = 0;
$special = 0;
@lines = ();
}
}
}
}

# workhorse for process_index; gets an array of lines comprising an index
# block Data entries, and prints those that match the unused hash.
sub report_broken_block {
my $unused = shift;
my $indexid = shift;
my $block = shift;
my $item;

foreach my $line (@_) {
if ($line =~ /^ Item\s+([0-9]+)/) {
$item = $1;
}
if ($line =~ /^ Block Id:\s+([0-9]+)\s+linp Index: ([0-9]+)/) {
my $blk = $1;
my $lp = $2;

next unless defined $unused->{$blk};
foreach my $hlp (@{$unused->{$blk}})
{
next unless $lp eq $hlp;

print "INDEX PTR TO UNUSED HEAP: ".
"index $indexid ($block,$item) -> ($blk, $lp)\n";
}
}
}
}
Hi,

We've detected what I think is some sort of index corruption in 8.1.
The server is running 8.1.11, so AFAICT the problem with truncated pages
in vacuum is already patched and accounted for (i.e. we reindexed, and a
bit later the problem presented itself again). There haven't been any
relevant fixes after that AFAICT.

What we see is that after a bit of updating the index, it starts having
tuples that poing to heap entries which are marked unused.

I detected one of these almost by accident, and then built a tool to
discover them by examining pg_filedump output. Basically what it does
is scan the heap, note which heap tuples are marked "unused", and then
scan the index and for each index tuple in leaf pages, see if it points
to an unused heap tuple. The number of occurrences is amazingly high.
Right after a reindex there isn't any occurrence; but after a while of
application load, a lot of them appear.

I catalogued this as index corruption: the theory is that as soon as the
unused heap tuple is reused, the index will have a pointer with the
wrong index key pointing to a live heap tuple.

(We saw an occurrence of this too, but I wasn't motivated enough to
write a tool to verify the data in index vs. heap tuples.)

However, seeing the high prevalence of the problem, I started
considering whether the tool is misreading the output anyway -- i.e.
maybe it's a bug in the tool, or a misconception on my part.

The tool output looks like this:

INDEX PTR TO UNUSED HEAP: index 273375 (78,18) -> (5530, 17)
INDEX PTR TO UNUSED HEAP: index 273375 (96,84) -> (5436, 3)
INDEX PTR TO UNUSED HEAP: index 273375 (111,1) -> (1317, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (145,1) -> (1665, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (174,1) -> (2656, 17)
INDEX PTR TO UNUSED HEAP: index 273375 (199,1) -> (1745, 21)
INDEX PTR TO UNUSED HEAP: index 273375 (207,1) -> (358, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (214,1) -> (2800, 17)
(many more lines of the same stuff)

What this means (first line) is that on index 273375, page 78, offset
18, there's a pointer to heap page 5530, offset 17; but that heap offset
shows up as unused. This is from the heap:

Block 5530 ********************************************************
<Header> -----
Block Offset: 0x02b34000 Offsets: Lower 136 (0x0088)
Block: Size 8192 Version 3 Upper 992 (0x03e0)
LSN: logid 130 recoff 0xd53b3090 Special 8192 (0x2000)
Items: 29 Free Space: 856
Length (including item array): 140

[ ... snip other items ...]

Item 17 -- Length: 0 Offset: 504 (0x01f8) Flags: 0x00

This is from the index:

Block 78 ********************************************************
<Header> -----
Block Offset: 0x0009c000 Offsets: Lower 788 (0x0314)
Block: Size 8192 Version 3 Upper 5104 (0x13f0)
LSN: logid 131 recoff 0x01b3c6e0 Special 8176 (0x1ff0)
Items: 192 Free Space: 4316
Length (including item array): 792

[ ... snip other items ... ]

Item 18 -- Length: 16 Offset: 5184 (0x1440) Flags: USED
Block Id: 5530 linp Index: 17 Size: 16
Has Nulls: 0 Has Varwidths: 0


Of course, we're very concerned about this problem. Personally I am
prepared to believe that this could be a hardware problem, because no
one else seems to be reporting this kind of thing, but I don't see how
it could cause this particular issue and not more wide-ranging data
corruption.

I would like comments on whether this is really a problem or I am just
misreading pg_filedump output. If we confirm this, we can try to
proceed to investigate it further.

I attach the checker tool in case it is of any interest.

Thanks,

--
Alvaro Herrera

http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

No comments: