Friday, August 8, 2008

Re: [HACKERS] For what should pg_stop_backup wait?

Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.316
diff -c -r1.316 xlog.c
*** src/backend/access/transam/xlog.c 13 Jul 2008 20:45:47 -0000 1.316
--- src/backend/access/transam/xlog.c 8 Aug 2008 13:56:40 -0000
***************
*** 1165,1170 ****
--- 1165,1184 ----
/* Retry creation of the .ready file */
if (create_if_missing)
XLogArchiveNotify(xlog);
+ else
+ {
+ char xlogpath[MAXPGPATH];
+
+ snprintf(xlogpath, MAXPGPATH, XLOGDIR "/%s", xlog);
+
+ /*
+ * Check to see if the WAL file has been removed by checkpoint,
+ * which implies it has already been archived, and explains why we
+ * can't see a status file for it.
+ */
+ if (stat(xlogpath, &stat_buf) != 0)
+ return true;
+ }

return false;
}
***************
*** 6721,6735 ****
CleanupBackupHistory();

/*
! * Wait until the history file has been archived. We assume that the
! * alphabetic sorting property of the WAL files ensures the last WAL
! * file is guaranteed archived by the time the history file is archived.
*
* We wait forever, since archive_command is supposed to work and
* we assume the admin wanted his backup to work completely. If you
* don't wish to wait, you can SET statement_timeout = xx;
*
! * If the status file is missing, we assume that is because it was
* set to .ready before we slept, then while asleep it has been set
* to .done and then removed by a concurrent checkpoint.
*/
--- 6735,6748 ----
CleanupBackupHistory();

/*
! * Wait until both the last WAL file filled during backup and the
! * history file have been archived.
*
* We wait forever, since archive_command is supposed to work and
* we assume the admin wanted his backup to work completely. If you
* don't wish to wait, you can SET statement_timeout = xx;
*
! * If the status files are missing, we assume that is because it was
* set to .ready before we slept, then while asleep it has been set
* to .done and then removed by a concurrent checkpoint.
*/
***************
*** 6739,6745 ****
seconds_before_warning = 60;
waits = 0;

! while (!XLogArchiveCheckDone(histfilepath, false))
{
CHECK_FOR_INTERRUPTS();

--- 6752,6759 ----
seconds_before_warning = 60;
waits = 0;

! while (!XLogArchiveCheckDone(histfilepath, false) ||
! !XLogArchiveCheckDone(stopxlogfilename, false))
{
CHECK_FOR_INTERRUPTS();

On Fri, 2008-08-08 at 12:57 +0100, Simon Riggs wrote:

> > Yes, statement_timeout may help. But, I don't want to use it, because the
> > *successful* backup is canceled.
> >
> > How about checking whether the stoppoint was archived by comparing with
> > the last WAL archived. The archiver process can tell the last WAL archived.
> > Or, we can calculate it from the status file.
>
> I think its easier to test whether the stopxlogfilename still exists in
> pg_xlog. If not, we know it has been archived away. We can add that as
> an extra condition inside the loop.
>
> So thinking we should test XLogArchiveCheckDone() for both
> stopxlogfilename and history file and then stat for the stop WAL file:

This seems better.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

No comments: