Sunday, August 17, 2008

[ANNOUNCE] == PostgreSQL Weekly News - August 17 2008 ==

== PostgreSQL Weekly News - August 17 2008 ==

== PostgreSQL Jobs for August ==

http://archives.postgresql.org/pgsql-jobs/2008-08/threads.php

== PostgreSQL Local ==

The Prato Linux User Group will be having PostgreSQL talks in
September. The schedule in Italian is:
http://www.prato.linux.it/serate_a_tema_2008

PGCon Brazil 2008 will be on September 26-27 at Unicamp in Campinas.
http://pgcon.postgresql.org.br/index.en.html

PgDay.fr will be October 4 in Toulouse. The Call for Papers is open:
http://www.postgresqlfr.org/?q=node/1686
Registration:
http://www.pgday.fr/doku.php/inscription

PostgreSQL Conference West 2008 will be October 10-12 at Portland
State University in Portland, Oregon.
http://www.postgresqlconference.org/
Talk submission at:
http://www.postgresqlconference.org/west08/talk_submission/

Sponsor the European PGDay!
http://www.pgday.org/en/sponsors/campaign

The Call for Papers for European PGDay has begun.
http://www.pgday.org/en/call4papers

PGDay.(IT|EU) 2008 will be October 17 and 18 in Prato.
http://www.pgday.org/it/

== PostgreSQL in the News ==

Planet PostgreSQL: http://www.planetpostgresql.org/

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm Pacific time.
Please send English language ones to david@fetter.org, German language
to pwn@pgug.de, Italian language to pwn@itpug.org.

== Applied Patches ==

Heikki Linnakangas committed:

- Introduce the concept of relation forks. An smgr relation can now
consist of multiple forks, and each fork can be created and grown
separately. The bulk of this patch is about changing the smgr API
to include an extra ForkNumber argument in every smgr function.
Also, smgrscheduleunlink and smgrdounlink no longer implicitly call
smgrclose, because other forks might still exist after unlinking
one. The callers of those functions have been modified to call
smgrclose instead. This patch in itself doesn't have any
user-visible effect, but provides the infrastructure needed for
upcoming patches. The additional forks envisioned are a rewritten
FSM implementation that doesn't rely on a fixed-size shared memory
block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.

- Relation forks patch requires a catversion bump due to changes in
the format of some WAL records, and two-phase state files, which I
forgot.

- pg_buffercache needs to be taught about relation forks, as Greg
Stark pointed out.

- Fix pull_up_simple_union_all to copy all rtable entries from child
subquery to parent, not only those with RangeTblRefs. We need them
in ExecCheckRTPerms. Report by Brendan O'Shea. Back-patch to 8.2,
where pull_up_simple_union_all was introduced.

Bruce Momjian committed:

- Add to TODO: "Add 'hostgss' pg_hba.conf option to allow GSS
link-level encryption."

- Add URL for TODO: "Improve ability to modify views via ALTER TABLE."

- Fix version warning bug in recently applied adjustments to psql
startup. Gregory Stark

- Add new SQL training web site to FAQ:

- Update Russian FAQ. Viktor Vislobokov.

- In pgsql/src/interfaces/libpq/bcc32.mak, synchronize Borland libpq
makefile to match MSVC. Backpatch to 8.3.X.

- Add to TODO: "Prevent query cancel packets from being replayed by an
attacker, especially when using SSL."

- Add URL for TODO: "Improve ability to modify views via ALTER TABLE."

- Update instructions on generating TODO.html

Peter Eisentraut committed:

- Remove TODO item, "Allow XML to accept more liberal DOCTYPE
specifications." Everything works correctly, per today's email to
-general.

Alvaro Herrera committed:

- Have autovacuum consider processing TOAST tables separately from
their main tables. This requires vacuum() to accept processing a
toast table standalone, so there's a user-visible change in that
it's now possible (for a superuser) to execute "VACUUM
pg_toast.pg_toast_XXX".

Tom Lane committed:

- Implement SEMI and ANTI joins in the planner and executor.
(Semijoins replace the old JOIN_IN code, but antijoins are new
functionality.) Teach the planner to convert appropriate EXISTS and
NOT EXISTS subqueries into semi and anti joins respectively. Also,
LEFT JOINs with suitable upper-level IS NULL filters are recognized
as being anti joins. Unify the InClauseInfo and OuterJoinInfo
infrastructure into "SpecialJoinInfo". With that change, it becomes
possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That
needs to be taken much further than this patch does, but the next
step is to change the API for oprjoin selectivity functions, which
seems like material for a separate patch. So for the moment the
output size estimates for semi and especially anti joins are quite
bogus.

- Performance fix for new anti-join code in nodeMergejoin.c: after
finding a match in antijoin mode, we should advance to next outer
tuple not next inner. We know we don't want to return this outer
tuple, and there is no point in advancing over matching inner tuples
now, because we'd just have to do it again if the next outer tuple
has the same merge key. This makes a noticeable difference if there
are lots of duplicate keys in both inputs. Similarly, after finding
a match in semijoin mode, arrange to advance to the next outer tuple
after returning the current match; or immediately, if it fails the
extra quals. The rationale is the same. (This is a performance bug
in existing releases; perhaps worth back-patching? The planner
tries to avoid using mergejoin with lots of duplicates, so it may
not be a big issue in practice.) Nestloop and hash got this right to
start with, but I made some cosmetic adjustments there to make the
corresponding bits of logic look more similar.

- Clean up the loose ends in selectivity estimation left by my patch
for semi and anti joins. To do this, pass the SpecialJoinInfo
struct for the current join as an additional optional argument to
operator join selectivity estimation functions. This allows the
estimator to tell not only what kind of join is being formed, but
which variable is on which side of the join; a requirement long
recognized but not dealt with till now. This also leaves the door
open for future improvements in the estimators, such as accounting
for the null-insertion effects of lower outer joins. I didn't do
anything about that in the current patch but the information is in
principle deducible from what's passed. The patch also clarifies
the definition of join selectivity for semi/anti joins: it's the
fraction of the left input that has (at least one) match in the
right input. This allows getting rid of some very fuzzy thinking
that I had committed in the original 7.4-era IN-optimization patch.
There's probably room to estimate this better than the present patch
does, but at least we know what to estimate. Since I had to touch
CREATE OPERATOR anyway to allow a variant signature for join
estimator functions, I took the opportunity to add a couple of
additional checks that were missing, per my recent message to
-hackers:
* Check that estimator functions return float8; Require execute
* permission at the time of CREATE OPERATOR on the
operator's function as well as the estimator functions;
* Require ownership of any pre-existing operator that's modified by
the command. I also moved the lookup of the functions out of
OperatorCreate() and into operatorcmds.c, since that seemed more
consistent with most of the other catalog object creation processes,
eg CREATE TYPE.

- Fix a couple of places where psql might fail to report a suitable
error if PQexec returns NULL. These don't seem significant enough
to be worth back-patching, but they ought to get fixed ...

- In pgsql/src/bin/pg_dump/pg_backup_db.c, fix pg_dump/pg_restore's
ExecuteSqlCommand() to behave suitably if PQexec returns NULL
instead of a PGresult. The former coding would fail, which is OK,
but it neglected to give you the PQerrorMessage that might tell you
why. In the oldest branches, there was another problem: it'd
sometimes report PQerrorMessage from the wrong connection.

- Improve sublink pullup code to handle ANY/EXISTS sublinks that are
at top level of a JOIN/ON clause, not only at top level of WHERE.
(However, we can't do this in an outer join's ON clause, unless the
ANY/EXISTS refers only to the nullable side of the outer join, so
that it can effectively be pushed down into the nullable side.) Per
request from Kevin Grittner. In passing, fix a bug in the initial
implementation of EXISTS pullup: it would Assert if the EXIST's
WHERE clause used a join alias variable. Since we haven't yet
flattened join aliases when this transformation happens, it's
necessary to include join relids in the computed set of RHS relids.

- In pgsql/src/backend/optimizer/plan/subselect.c, remove prohibition
against SubLinks in the WHERE clause of an EXISTS subquery that
we're considering pulling up. I hadn't wanted to think through
whether that could work during the first pass at this stuff.
However, on closer inspection it seems to be safe enough.

- In pgsql/src/backend/optimizer/path/joinrels.c, add some defenses
against constant-FALSE outer join conditions. Since
eval_const_expressions will generally throw away anything that's
ANDed with constant FALSE, what we're left with given an example
like select * from tenk1 a where (unique1,0) in (select unique2,1
from tenk1 b); is a cartesian product computation, which is really
not acceptable. This is a regression in CVS HEAD compared to
previous releases, which were able to notice the impossible join
condition in this case --- though not in some related cases that are
also improved by this patch, such as select * from tenk1 a left join
tenk1 b on (a.unique1=b.unique2 and 0=1); Fix by skipping evaluation
of the appropriate side of the outer join in cases where it's
demonstrably unnecessary.

Magnus Hagander committed:

- Make the temporary directory for pgstat files configurable by the
GUC variable stats_temp_directory, instead of requiring the admin to
mount/symlink the pg_stat_tmp directory manually. For now the
config variable is PGC_POSTMASTER. Room for further improvment that
would allow it to be changed on-the-fly. Original idea by Euler
Taveira de Oliveira.

- In pgsql/src/tools/msvc/Solution.pm, probes.h is generated from
probes.d, not pg_trace.d.

== Rejected Patches (for now) ==

No one was disappointed this week :-)

== Pending Patches ==

Zdenek Kotala sent in a patch to introduce a page layout footprint in
aid of his in-place upgrades work.

David Wheeler sent in another patch to clean up the citext contrib
module.

Zdenek Kotala sent in a WIP patch for his new page API in aid of his
in-place upgrades work.

ITAGAKI Takahiro sent in another revision of his "Copy storage
parameters" patch.

Gregory Stark sent in a patch to fix a bug in psql where old variables
were being used for the new database connection after \c.

ITAGAKI Takahiro sent in a patch to add duration option (-T) to
pgbench instead of number of transactions (-t). -t and -T are mutually
exclusive.

Jan Urbanski sent in two more revisions of his oprrest patch for text
search.

Dmitri Koterov sent in a patch to add three new functions to
contrib/intarray: int_array_append_aggregate(int[]), which quickly
merges arrays, _int_group_count_sort(int[], bool), a frequency-based
sort, and bidx(int[], int), a binary search in a sorted array.

Martin Pihlak sent in a patch that implements plan invalidation on
function DROP, REPLACE and ALTER.

Ryan Bradetich sent in a patch to implement unsigned integer types.

Magnus Hagander sent in a patch to do pg_hba.conf and postgresql.conf
parsing in the postmaster and matching in the backend. Per
discussion.


---------------------------(end of broadcast)---------------------------
-To unsubscribe from this list, send an email to:

pgsql-announce-unsubscribe@postgresql.org

No comments: