Friday, August 8, 2008

Re: [HACKERS] For what should pg_stop_backup wait?

On Fri, 2008-08-08 at 11:47 +0900, Fujii Masao wrote:
> On Thu, Aug 7, 2008 at 11:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >

> In this situation, the history file (000000010000000000000004.00000020.backup)
> is behind the stoppoint (000000010000000000000004) in the alphabetic order.
> So, pg_stop_backup should wait for both the stoppoint and the history
> file, I think.

OK, I see that now.

>
> > ! while (!XLogArchiveCheckDone(stopxlogfilename, false))
>
> If a concurrent checkpoint removes the status file before XLogArchiveCheckDone,
> pg_stop_backup continues waiting forever. This is undesirable behavior.

I think it will only get removed by the second checkpoint, not the
first. So the risk of that happening seems almost certainly impossible.
But we'll put in a check just in case.

> Yes, statement_timeout may help. But, I don't want to use it, because the
> *successful* backup is canceled.
>
> How about checking whether the stoppoint was archived by comparing with
> the last WAL archived. The archiver process can tell the last WAL archived.
> Or, we can calculate it from the status file.

I think its easier to test whether the stopxlogfilename still exists in
pg_xlog. If not, we know it has been archived away. We can add that as
an extra condition inside the loop.

So thinking we should test XLogArchiveCheckDone() for both
stopxlogfilename and history file and then stat for the stop WAL file:

BackupHistoryFileName(histfilepath, ThisTimeLineID, _logId, _logSeg,
startpoint.xrecoff % XLogSegSize);

seconds_before_warning = 60;
waits = 0;

while (!XLogArchiveCheckDone(histfilepath, false) ||
!XLogArchiveCheckDone(stopxlogfilename, false))
{
struct stat stat_buf;
char xlogpath[MAXPGPATH];

/*
* Check to see if file has already been archived and WAL file
* removed by a concurrent checkpoint
*/
snprintf(xlogpath, MAXPGPATH, XLOGDIR "/%s", stopxlogfilename);
if (XLogArchiveCheckDone(histfilepath, false) &&
stat(xlogpath, &stat_buf) != 0)
break;

CHECK_FOR_INTERRUPTS();

pg_usleep(1000000L);

if (++waits >= seconds_before_warning)
{
seconds_before_warning *= 2; /* This wraps in >10 years... */
elog(WARNING, "pg_stop_backup() waiting for archive to complete "
"(%d seconds delay)", waits);
}
}


--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: