Saturday, May 31, 2008

[PATCHES] synchronized scan: reset state at end of scan

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c1afff3..b5bf780 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1234,6 +1234,9 @@ heap_endscan(HeapScanDesc scan)

if (scan->rs_strategy != NULL)
FreeAccessStrategy(scan->rs_strategy);
+
+ if (scan->rs_syncscan)
+ ss_reset_location(scan->rs_rd);

pfree(scan);
}
diff --git a/src/backend/access/heap/syncscan.c b/src/backend/access/heap/syncscan.c
index dfc7265..5b2aa66 100644
--- a/src/backend/access/heap/syncscan.c
+++ b/src/backend/access/heap/syncscan.c
@@ -34,6 +34,8 @@
* INTERFACE ROUTINES
* ss_get_location - return current scan location of a relation
* ss_report_location - update current scan location
+ * ss_reset_location - reset location to zero if started by this
+ * process
*
*
* Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
@@ -91,6 +93,7 @@ bool trace_syncscan = false;
typedef struct ss_scan_location_t
{
RelFileNode relfilenode; /* identity of a relation */
+ pid_t pid; /* PID of the process that set the scan location */
BlockNumber location; /* last-reported location in the relation */
} ss_scan_location_t;

@@ -161,6 +164,7 @@ SyncScanShmemInit(void)
item->location.relfilenode.spcNode = InvalidOid;
item->location.relfilenode.dbNode = InvalidOid;
item->location.relfilenode.relNode = InvalidOid;
+ item->location.pid = 0;
item->location.location = InvalidBlockNumber;

item->prev = (i > 0) ?
@@ -212,6 +216,13 @@ ss_search(RelFileNode relfilenode, BlockNumber location, bool set)
else if (set)
item->location.location = location;

+ /*
+ * If we are starting a new scan, set the pid to that of the
+ * current process.
+ */
+ if(!set)
+ item->location.pid = MyProcPid;
+
/* Move the entry to the front of the LRU list */
if (item != scan_locations->head)
{
@@ -319,3 +330,53 @@ ss_report_location(Relation rel, BlockNumber location)
#endif
}
}
+
+/*
+ * ss_reset_location --- reset location to zero if started by this process
+ *
+ * When a scan finishes, it can remove itself from the list of
+ * scan_locations. This means that when scans are no longer being run
+ * concurrently, new scans will again be started at the beginning of the
+ * heap. This is not required for correctness.
+ *
+ * The scan_location entry holds the pid of the most recently started scan,
+ * and when a scan finishes, it resets the entry to zero if and only if the
+ * pid in the entry matches that of the current process.
+ *
+ * When concurrent scans are active, it is unlikely that the most
+ * recently started scan will finish first, so the hint will usually not
+ * be removed unless this is the only scan on that relation. If the scans
+ * are merely started at nearly the same time, and the last one to start
+ * happens to finish first, there would be little benefit from
+ * synchronizing with a nearly-complete scan, anyway.
+ *
+ * In the rare case that the most recently started scan does finish
+ * significantly before older concurrent scans (such as in the case of a
+ * LIMIT clause), the hint will most likely be quickly filled by a location
+ * report from one of those older scans. If another scan begins during that
+ * narrow window, it will not have the benefit of being synchronized with
+ * the older concurrent scans.
+ *
+ * If we can't get the lock without waiting, then we do nothing.
+ */
+void ss_reset_location(Relation rel)
+{
+ ss_lru_item_t *item;
+
+ if(LWLockConditionalAcquire(SyncScanLock, LW_EXCLUSIVE)) {
+ item = scan_locations->head;
+
+ while(item != NULL) {
+ if(item->location.pid == MyProcPid &&
+ RelFileNodeEquals(item->location.relfilenode, rel->rd_node)) {
+ /* Concurrent scans may still be active on this relation,
+ * we only know that this scan has finished. So, we just
+ * set the location back to zero rather than remove it.
+ */
+ item->location.location = 0;
+ }
+ item = item->next;
+ }
+ LWLockRelease(SyncScanLock);
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f29f835..7950c33 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -133,6 +133,7 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
/* in heap/syncscan.c */
extern void ss_report_location(Relation rel, BlockNumber location);
extern BlockNumber ss_get_location(Relation rel, BlockNumber relnblocks);
+extern void ss_reset_location(Relation rel);
extern void SyncScanShmemInit(void);
extern Size SyncScanShmemSize(void);

I was looking into supporting synchronized scans for VACUUM, and I
noticed that we currently don't remove the reported scan location as
this post suggests:

http://archives.postgresql.org/pgsql-patches/2007-06/msg00047.php

There was some debate about whether it should be done, but I thought
that the solution here seemed to satisfy most people's concerns:

http://archives.postgresql.org/pgsql-patches/2007-06/msg00052.php

I attached a patch that implements the above idea.

The benefit is that if you have a singular scan, it will start at the
beginning of the heap and not at some arbitrary place.

The cost is that it's not 100% guaranteed that the location entry will
be removed. The backend that started the scan could abort, die, etc.
Also, in rare situations there is a small window created where a new
scan might not be synchronized with existing concurrent scans. This is
really only an issue when issuing queries with limits or issuing two
scans that progress at different rates. I think it's somewhat reasonable
to say that if you're doing either of those things, you shouldn't be too
surprised if it messes with synchronized scanning. I have more
information in the comments in the attached patch.

I do not have a strong opinion about whether this patch is applied or
not. I am submitting this just for the sake of completeness.

Regards,
Jeff Davis

No comments: