Saturday, May 31, 2008

Re: [pgsql-advocacy] Feedback on blog post about Replication Feature decision and its impact

On Saturday 31 May 2008 14:48:26 Josh Berkus wrote:
> Dirk,
>
> > <a href="http://www.enterprisedb.com/">EnterpriseDB</a> is a well-funded
> > database startup whose product builds on PostgreSQL. EnterpriseDB adds
> > many "enterprise-readiness" features to the basic PostgreSQL product,
> > including database replication, and much more.
>
> The replication-in-core vs. not-in-core has absolutely nothing to do with
> EnterpriseDB either way. I think you'd be doing a disservice to your
> readers by implying that it does. Or with the GPL. If you want to blog
> about these things, maybe break them up into seperate posts?
>

Adding any feature into core Postgres can have a ripple effect on
EnterpriseDB, so I don't see any reason not to discuss these issues. I'd be
more concern about the factual errors in the post (things like "core team of
seven committers") than exactly where the OP thinks the discussion should go.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

--
Sent via pgsql-advocacy mailing list (pgsql-advocacy@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-advocacy

[HACKERS] phrase search

Index: src/backend/utils/adt/Makefile
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/utils/adt/Makefile,v
retrieving revision 1.69
diff -u -r1.69 Makefile
--- src/backend/utils/adt/Makefile 19 Feb 2008 10:30:08 -0000 1.69
+++ src/backend/utils/adt/Makefile 31 May 2008 19:57:34 -0000
@@ -29,7 +29,7 @@
tsginidx.o tsgistidx.o tsquery.o tsquery_cleanup.o tsquery_gist.o \
tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \
tsvector.o tsvector_op.o tsvector_parser.o \
- txid.o uuid.o xml.o
+ txid.o uuid.o xml.o phrase_search.o

like.o: like.c like_match.c

Index: src/backend/utils/adt/phrase_search.c
===================================================================
RCS file: src/backend/utils/adt/phrase_search.c
diff -N src/backend/utils/adt/phrase_search.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ src/backend/utils/adt/phrase_search.c 31 May 2008 19:56:59 -0000
@@ -0,0 +1,167 @@
+#include "postgres.h"
+
+#include "tsearch/ts_type.h"
+#include "tsearch/ts_utils.h"
+
+#include "fmgr.h"
+
+#ifdef PG_MODULE_MAGIC
+PG_MODULE_MAGIC;
+#endif
+
+PG_FUNCTION_INFO_V1(is_phrase_present);
+Datum is_phrase_present(PG_FUNCTION_ARGS);
+
+typedef struct {
+ WordEntryPosVector *posVector;
+ int4 posInPhrase;
+ int4 curpos;
+} PhraseInfo;
+
+static int
+WordCompareVectorEntry(char *eval, WordEntry *ptr, ParsedWord *prsdword)
+{
+ if (ptr->len == prsdword->len)
+ return strncmp(
+ eval + ptr->pos,
+ prsdword->word,
+ prsdword->len);
+
+ return (ptr->len > prsdword->len) ? 1 : -1;
+}
+
+/*
+ * Returns a pointer to a WordEntry from tsvector t corresponding to prsdword.
+ * Returns NULL if not found.
+ */
+static WordEntry *
+find_wordentry_prsdword(TSVector t, ParsedWord *prsdword)
+{
+ WordEntry *StopLow = ARRPTR(t);
+ WordEntry *StopHigh = (WordEntry *) STRPTR(t);
+ WordEntry *StopMiddle;
+ int difference;
+
+ /* Loop invariant: StopLow <= item < StopHigh */
+
+ while (StopLow < StopHigh)
+ {
+ StopMiddle = StopLow + (StopHigh - StopLow) / 2;
+ difference = WordCompareVectorEntry(STRPTR(t), StopMiddle, prsdword);
+ if (difference == 0)
+ return StopMiddle;
+ else if (difference < 0)
+ StopLow = StopMiddle + 1;
+ else
+ StopHigh = StopMiddle;
+ }
+
+ return NULL;
+}
+
+
+static int4
+check_and_advance(int4 i, PhraseInfo *phraseInfo)
+{
+ WordEntryPosVector *posvector1, *posvector2;
+ int4 diff;
+
+ posvector1 = phraseInfo[i].posVector;
+ posvector2 = phraseInfo[i+1].posVector;
+
+ diff = phraseInfo[i+1].posInPhrase - phraseInfo[i].posInPhrase;
+ while (posvector2->pos[phraseInfo[i+1].curpos] - posvector1->pos[phraseInfo[i].curpos] < diff)
+ if (phraseInfo[i+1].curpos >= posvector2->npos - 1)
+ return 2;
+ else
+ phraseInfo[i+1].curpos += 1;
+
+ if (posvector2->pos[phraseInfo[i+1].curpos] - posvector1->pos[phraseInfo[i].curpos] == diff)
+ return 1;
+ else
+ return 0;
+}
+
+int4
+initialize_phraseinfo(ParsedText *prs, TSVector t, PhraseInfo *phraseInfo)
+{
+ WordEntry *entry;
+ int4 i;
+
+ for (i = 0; i < prs->curwords; i++)
+ {
+ phraseInfo[i].posInPhrase = prs->words[i].pos.pos;
+ entry = find_wordentry_prsdword(t, &(prs->words[i]));
+ if (entry == NULL)
+ return 0;
+ else
+ phraseInfo[i].posVector = _POSVECPTR(t, entry);
+ }
+ return 1;
+}
+Datum
+is_phrase_present(PG_FUNCTION_ARGS)
+{
+ ParsedText prs;
+ int4 numwords, i, retval, found = 0;
+ PhraseInfo *phraseInfo;
+ text *phrase = PG_GETARG_TEXT_P(0);
+ TSVector t = PG_GETARG_TSVECTOR(1);
+ Oid cfgId = getTSCurrentConfig(true);
+
+ prs.lenwords = (VARSIZE(phrase) - VARHDRSZ) / 6; /* just estimation of * word's number */
+ if (prs.lenwords == 0)
+ prs.lenwords = 2;
+ prs.curwords = 0;
+ prs.pos = 0;
+ prs.words = (ParsedWord *) palloc0(sizeof(ParsedWord) * prs.lenwords);
+
+ parsetext(cfgId, &prs, VARDATA(phrase), VARSIZE(phrase) - VARHDRSZ);
+
+ // allocate & initialize
+ numwords = prs.curwords;
+ phraseInfo = palloc0(numwords * sizeof(PhraseInfo));
+
+
+ if (numwords > 0 && initialize_phraseinfo(&prs, t, phraseInfo))
+ {
+ // if there is only one word and we are able to initialize
+ // the phraseInfo then it is a success
+ if (numwords == 1)
+ found = 1;
+ while (!found)
+ {
+ retval = check_and_advance(0, phraseInfo);
+ if (retval == 2)
+ break;
+ else if (retval == 1)
+ {
+ for (i = 1; i < numwords - 1; i++)
+ {
+ retval = check_and_advance(i, phraseInfo);
+ if (retval == 2 || retval == 0)
+ break;
+ }
+ if (i >= numwords - 1)
+ // found a phrase
+ found = 1;
+ else if (retval == 2)
+ // no chance of finding a phrase
+ break;
+ }
+ // now move the pos of 0th phrase
+ if (phraseInfo[0].curpos >= phraseInfo[0].posVector->npos- 1)
+ // no chance of finding a phrase
+ break;
+ else
+ phraseInfo[0].curpos += 1;
+ }
+ }
+ pfree(phraseInfo);
+ pfree(prs.words);
+ // prepare to leave
+ PG_FREE_IF_COPY(phrase, 0);
+ PG_FREE_IF_COPY(t, 1);
+ // return
+ PG_RETURN_INT32(found);
+}
I have attached a patch for phrase search with respect to the cvs head.
Basically it takes a a phrase (text) and a TSVector. It checks if the
relative positions of lexeme in the phrase are same as in their
positions in TSVector.

If the configuration for text search is "simple", then this will produce
exact phrase search. Otherwise the stopwords in a phrase will be ignored
and the words in a phrase will only be matched with the stemmed lexeme.

For my application I am using this as a separate shared object. I do not
know how to expose this function from the core. Can someone explain how
to do this?

I saw this discussion on phrase search and I am not sure what other
functionality is required.

http://archives.postgresql.org/pgsql-general/2008-02/msg01170.php

-Sushant.

Re: [HACKERS] Overhauling GUCS

Greg Sabino Mullane wrote:

> * Much more verbose comments. The abovementioned default_statistics_target
> is a very important settings, but there is zero explanation in the file
> of what it is. The only thing we're told is that it ranges from 10 - 1000.
> We can do better than that. Users would absolutely love it if each item
> had a clear explanation, and it would be well worth a slightly increased
> file size. See the postfix main.cf file for a good example of such.

I kind of agree with this but actually think we should have the bare
minimum comments in the file. Why? Because our documentation links are
static. Each setting should have the URL to the full documentation on a
particular setting.

>
> * Create a tool, or at least a best practices, for controlling and tracking
> changes to the file.
>

This I disagree with. There are plenty of tools to handle this should
someone really want to. SVN, CVS, parrot, etc... Let systems management
be the domain of systems management.


>
> * Put some doc URLs in the file, at the minimum per major section. At the
> very bare minimum, a real URL at the top.
>

Hah! See above :)


>
> * Indicate which values can be changed per-session or per-role.
>

Agreed. Along with this although offtopic for this post is a grid in the
docs that are explicit about this.

>
> * Fix the disparity between the syntax in the file and the SET interface.
> For example, work_mem = 16MB works in the conf file, but you have to write
> SET work_mem = '16MB'. Easiest is probably just to quote everything in the conf.

Agreed.

>
> * I'm tempted by the argument of creating a separate file for the obscure
> settings, but I think it would be too much pain, and nobody would ever agree on
> which settings are 'important' and which are 'obscure'.
>

Actually I could buy into this. There really are only about a dozen must
change settings (if that). I could see something like:

Memory settings:

network etc/network.conf
include etc/memory.conf
logging etc/logging.conf

etc...


>
> * It might be nice to mention other ways to reload the file, such as
> 'service postgresql reload', or whatever Windows uses.
>

I think a url to the docs is a better idea here.

>
> * The word 'paramters' is still misspelled. :)
>

Heh.

> * Since the executable is now named "postgres" (thank goodness we got
> rid of "postmaster"), the file should be named 'postgres.conf'. This would
> also be a way to quickly distinguish 'old' vs 'new' style conf files if
> we end up making major changes to it.

It was never postmaster.conf (that I can recall). I don't see the issue
here. Consider apache... It isn't apache.conf.

I think postgresql.conf (as that is the name of the software) makes sense.

Sincerely,

Joshua D. Drake

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] synchronized scans for VACUUM

Jeff Davis <pgsql@j-davis.com> writes:
> The objections to synchronized scans for VACUUM as listed in that thread
> (summary):

> 2. vacuum takes breaks from the scan to clean up the indexes when it
> runs out of maintenance_work_mem.

> 2. There have been suggestions about a more compact representation for
> the tuple id list. If this works, it will solve this problem.

It will certainly not "solve" the problem. What it will do is mean that
the breaks are further apart and longer, which seems to me to make the
conflict with syncscan behavior worse not better.

> 3. vacuum takes breaks for the cost delay

> 3. Offering synchronized vacuums could reduce the need for these
> elective pauses.

How so? A vacuum that happens not to be part of a syncscan herd is
going to be just as bad for system performance as ever.


It still seems to me that vacuum is unlikely to be a productive member
of a syncscan herd --- it just isn't going to have similar scan-speed
behavior to typical queries.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] synchronized scan: reset state at end of scan

Jeff Davis <pgsql@j-davis.com> writes:
> I was looking into supporting synchronized scans for VACUUM, and I
> noticed that we currently don't remove the reported scan location as
> this post suggests:

> http://archives.postgresql.org/pgsql-patches/2007-06/msg00047.php

I thought the end conclusion of that thread was to not do anything,
on the grounds that
(1) having new scans sometimes fail to join an existing syncscan
herd would be a bad thing because of the resulting performance
uncertainty;
(2) partially masking the order-nondeterminism created by syncscans
would be a bad thing because it would make it more likely for people
to not notice the issue during testing.

regards, tom lane

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: [HACKERS] Overhauling GUCS

Greg Sabino Mullane wrote:
>
> * A weak +1 to the #include idea. I'm much more inclined to simply
> add a new section to the bottom of the file and use version
> control, but I could see ways in which include would be useful.
>
>
>
>

We already have include directives, and have had since 8.2.

Thus spake the docs:

"In addition to parameter settings, the postgresql.conf file can contain
/include directives/, which specify another file to read and process as
if it were inserted into the configuration file at this point. Include
directives simply look like:

include 'filename'

If the file name is not an absolute path, it is taken as relative to the
directory containing the referencing configuration file. Inclusions can
be nested."


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] Defining character sets for indicidual fields

On May 31, 2008, at 6:22 PM, Ram Ravichandran wrote:

> Hi,
>
> By default, my postgresql server is set to use UTF8 character set. I
> was wondering if there is any way to make sure that certain fields
> like url etc. only makes use of ascii. My main aim is to save space
> by using only 1 byte / character for urls (some of the urls are
> over 200 characters long). Is this possible? Or are all characters
> eventually converted to UTF8 during storage?

An ascii string and the UTF8 representation of it will take exactly
the same number of bytes, so if space used is your concern it's not an
issue.

Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Defining character sets for indicidual fields

Hi,

By default, my postgresql server is set to use UTF8 character set. I was wondering if there is any way to make sure that certain fields like url etc. only makes use of ascii. My main aim is to save space by using only 1 byte / character for urls  (some of the urls are over 200 characters long). Is this possible? Or are all characters eventually converted to UTF8 during storage?

Thanks,

Ram

[HACKERS] synchronized scans for VACUUM

Previous thread for reference:

http://archives.postgresql.org/pgsql-patches/2007-06/msg00096.php

The objections to synchronized scans for VACUUM as listed in that thread
(summary):

1. vacuum sometimes progresses faster than a regular heapscan, because
it doesn't need to check WHERE clauses, etc.

2. vacuum takes breaks from the scan to clean up the indexes when it
runs out of maintenance_work_mem.

3. vacuum takes breaks for the cost delay

4. vacuum will dirty a lot of the blocks as it goes, and that will cause
some kind of interaction with the ring buffer

I'd like to address these one by one to see what problems are really in
our way:

1. This would mean that it's not an I/O limited scan. I think as long as
we're talking about regular table scans that can benefit from
synchronized scanning, a vacuum of the same table would also benefit. A
microbenchmark could show whether some benefit exists or not.

2. There have been suggestions about a more compact representation for
the tuple id list. If this works, it will solve this problem.

3. Offering synchronized vacuums could reduce the need for these
elective pauses.

4. This probably has more to do with the buffer ring than synchronized
scans. There could be some bad interaction there, but I don't see that
it's clearly bad.

Additionally, with the possible exception of #4, I don't see the
situation being worse than it is currently.

Thoughts?

Regards,
Jeff Davis


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] Converting empty input strings to Nulls

Hi Ken-

Have you looked at encode ?
http://www.postgresql.org/docs/8.3/interactive/functions-string.html

Anyone else?
Martin
----- Original Message -----
From: "Ken Winter" <ken@sunward.org>
To: "PostgreSQL pg-general List" <pgsql-general@postgresql.org>
Sent: Saturday, May 31, 2008 1:40 PM
Subject: [GENERAL] Converting empty input strings to Nulls


Applications accessing my PostgreSQL 8.0 database like to submit no-value
date column values as empty strings rather than as Nulls. This, of course,
causes this PG error:

SQL State: 22007
ERROR: invalid input syntax for type date: ""

I'm looking for a way to trap this bad input at the database level, quietly
convert the input empty strings to Null, and store the Null in the date
column. I tried a BEFORE INSERT OR UPDATE trigger evoking this function ...

CREATE OR REPLACE FUNCTION "public"."empty_string_to_null"()
RETURNS trigger AS
$BODY$
BEGIN
IF CAST(NEW.birth_date AS text) = '' THEN
NEW.birth_date = Null;
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

.. but an empty string still evokes the error even before this function is
triggered.

Is there a way to convert empty strings to Nulls before the error is evoked?

~ TIA
~ Ken


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[PATCHES] synchronized scan: reset state at end of scan

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c1afff3..b5bf780 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1234,6 +1234,9 @@ heap_endscan(HeapScanDesc scan)

if (scan->rs_strategy != NULL)
FreeAccessStrategy(scan->rs_strategy);
+
+ if (scan->rs_syncscan)
+ ss_reset_location(scan->rs_rd);

pfree(scan);
}
diff --git a/src/backend/access/heap/syncscan.c b/src/backend/access/heap/syncscan.c
index dfc7265..5b2aa66 100644
--- a/src/backend/access/heap/syncscan.c
+++ b/src/backend/access/heap/syncscan.c
@@ -34,6 +34,8 @@
* INTERFACE ROUTINES
* ss_get_location - return current scan location of a relation
* ss_report_location - update current scan location
+ * ss_reset_location - reset location to zero if started by this
+ * process
*
*
* Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
@@ -91,6 +93,7 @@ bool trace_syncscan = false;
typedef struct ss_scan_location_t
{
RelFileNode relfilenode; /* identity of a relation */
+ pid_t pid; /* PID of the process that set the scan location */
BlockNumber location; /* last-reported location in the relation */
} ss_scan_location_t;

@@ -161,6 +164,7 @@ SyncScanShmemInit(void)
item->location.relfilenode.spcNode = InvalidOid;
item->location.relfilenode.dbNode = InvalidOid;
item->location.relfilenode.relNode = InvalidOid;
+ item->location.pid = 0;
item->location.location = InvalidBlockNumber;

item->prev = (i > 0) ?
@@ -212,6 +216,13 @@ ss_search(RelFileNode relfilenode, BlockNumber location, bool set)
else if (set)
item->location.location = location;

+ /*
+ * If we are starting a new scan, set the pid to that of the
+ * current process.
+ */
+ if(!set)
+ item->location.pid = MyProcPid;
+
/* Move the entry to the front of the LRU list */
if (item != scan_locations->head)
{
@@ -319,3 +330,53 @@ ss_report_location(Relation rel, BlockNumber location)
#endif
}
}
+
+/*
+ * ss_reset_location --- reset location to zero if started by this process
+ *
+ * When a scan finishes, it can remove itself from the list of
+ * scan_locations. This means that when scans are no longer being run
+ * concurrently, new scans will again be started at the beginning of the
+ * heap. This is not required for correctness.
+ *
+ * The scan_location entry holds the pid of the most recently started scan,
+ * and when a scan finishes, it resets the entry to zero if and only if the
+ * pid in the entry matches that of the current process.
+ *
+ * When concurrent scans are active, it is unlikely that the most
+ * recently started scan will finish first, so the hint will usually not
+ * be removed unless this is the only scan on that relation. If the scans
+ * are merely started at nearly the same time, and the last one to start
+ * happens to finish first, there would be little benefit from
+ * synchronizing with a nearly-complete scan, anyway.
+ *
+ * In the rare case that the most recently started scan does finish
+ * significantly before older concurrent scans (such as in the case of a
+ * LIMIT clause), the hint will most likely be quickly filled by a location
+ * report from one of those older scans. If another scan begins during that
+ * narrow window, it will not have the benefit of being synchronized with
+ * the older concurrent scans.
+ *
+ * If we can't get the lock without waiting, then we do nothing.
+ */
+void ss_reset_location(Relation rel)
+{
+ ss_lru_item_t *item;
+
+ if(LWLockConditionalAcquire(SyncScanLock, LW_EXCLUSIVE)) {
+ item = scan_locations->head;
+
+ while(item != NULL) {
+ if(item->location.pid == MyProcPid &&
+ RelFileNodeEquals(item->location.relfilenode, rel->rd_node)) {
+ /* Concurrent scans may still be active on this relation,
+ * we only know that this scan has finished. So, we just
+ * set the location back to zero rather than remove it.
+ */
+ item->location.location = 0;
+ }
+ item = item->next;
+ }
+ LWLockRelease(SyncScanLock);
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f29f835..7950c33 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -133,6 +133,7 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
/* in heap/syncscan.c */
extern void ss_report_location(Relation rel, BlockNumber location);
extern BlockNumber ss_get_location(Relation rel, BlockNumber relnblocks);
+extern void ss_reset_location(Relation rel);
extern void SyncScanShmemInit(void);
extern Size SyncScanShmemSize(void);

I was looking into supporting synchronized scans for VACUUM, and I
noticed that we currently don't remove the reported scan location as
this post suggests:

http://archives.postgresql.org/pgsql-patches/2007-06/msg00047.php

There was some debate about whether it should be done, but I thought
that the solution here seemed to satisfy most people's concerns:

http://archives.postgresql.org/pgsql-patches/2007-06/msg00052.php

I attached a patch that implements the above idea.

The benefit is that if you have a singular scan, it will start at the
beginning of the heap and not at some arbitrary place.

The cost is that it's not 100% guaranteed that the location entry will
be removed. The backend that started the scan could abort, die, etc.
Also, in rare situations there is a small window created where a new
scan might not be synchronized with existing concurrent scans. This is
really only an issue when issuing queries with limits or issuing two
scans that progress at different rates. I think it's somewhat reasonable
to say that if you're doing either of those things, you shouldn't be too
surprised if it messes with synchronized scanning. I have more
information in the comments in the attached patch.

I do not have a strong opinion about whether this patch is applied or
not. I am submitting this just for the sake of completeness.

Regards,
Jeff Davis

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

Index: src/backend/tsearch/dict.c
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/dict.c,v
retrieving revision 1.5
diff -u -r1.5 dict.c
--- src/backend/tsearch/dict.c 25 Mar 2008 22:42:43 -0000 1.5
+++ src/backend/tsearch/dict.c 30 May 2008 23:20:57 -0000
@@ -16,6 +16,7 @@
#include "catalog/pg_type.h"
#include "tsearch/ts_cache.h"
#include "tsearch/ts_utils.h"
+#include "tsearch/ts_public.h"
#include "utils/builtins.h"


Index: src/backend/tsearch/to_tsany.c
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/to_tsany.c,v
retrieving revision 1.12
diff -u -r1.12 to_tsany.c
--- src/backend/tsearch/to_tsany.c 16 May 2008 16:31:01 -0000 1.12
+++ src/backend/tsearch/to_tsany.c 31 May 2008 08:43:27 -0000
@@ -15,6 +15,7 @@

#include "catalog/namespace.h"
#include "tsearch/ts_cache.h"
+#include "tsearch/ts_public.h"
#include "tsearch/ts_utils.h"
#include "utils/builtins.h"
#include "utils/syscache.h"
Index: src/backend/tsearch/ts_parse.c
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/ts_parse.c,v
retrieving revision 1.8
diff -u -r1.8 ts_parse.c
--- src/backend/tsearch/ts_parse.c 16 May 2008 16:31:01 -0000 1.8
+++ src/backend/tsearch/ts_parse.c 31 May 2008 08:09:32 -0000
@@ -485,7 +485,11 @@
addHLParsedLex(HeadlineParsedText *prs, TSQuery query, ParsedLex *lexs, TSLexeme *norms)
{
ParsedLex *tmplexs;
+ ParsedText *prstxt;
TSLexeme *ptr;
+ int startpos, endpos, pos;
+
+ startpos = prs->curwords;

while (lexs)
{
@@ -505,6 +509,29 @@
lexs = tmplexs;
}

+ /* update the corresponding ParsedText */
+ pos = startpos;
+ endpos = prs->curwords;
+ prstxt = &(prs->prstxt);
+ for (ptr = norms; ptr && ptr->lexeme; ptr++)
+ {
+ while (prstxt->curwords >= prstxt->lenwords)
+ {
+ prstxt->lenwords *= 2;
+ prstxt->words = (ParsedWord *) repalloc((void *) prstxt->words, prstxt->lenwords * sizeof(ParsedWord));
+ }
+ if (ptr->flags & TSL_ADDPOS)
+ pos++;
+
+ prstxt->words[prstxt->curwords].len = strlen(ptr->lexeme);
+ prstxt->words[prstxt->curwords].word = palloc( prstxt->words[prstxt->curwords].len);
+ memcpy(prstxt->words[prstxt->curwords].word, ptr->lexeme, prstxt->words[prstxt->curwords].len);
+ prstxt->words[prstxt->curwords].nvariant = ptr->nvariant;
+ prstxt->words[prstxt->curwords].flags = ptr->flags & TSL_PREFIX;
+ prstxt->words[prstxt->curwords].alen = 0;
+ prstxt->words[prstxt->curwords].pos.pos = pos;
+ prstxt->curwords++;
+ }
if (norms)
{
ptr = norms;
Index: src/backend/tsearch/wparser.c
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/wparser.c,v
retrieving revision 1.9
diff -u -r1.9 wparser.c
--- src/backend/tsearch/wparser.c 12 May 2008 00:00:50 -0000 1.9
+++ src/backend/tsearch/wparser.c 31 May 2008 08:44:01 -0000
@@ -317,6 +317,9 @@
prs.lenwords = 32;
prs.words = (HeadlineWordEntry *) palloc(sizeof(HeadlineWordEntry) * prs.lenwords);

+ prs.prstxt.lenwords = 32;
+ prs.prstxt.words = (ParsedWord *) palloc(sizeof(ParsedWord) * prs.prstxt.lenwords);
+
hlparsetext(cfg->cfgId, &prs, query, VARDATA(in), VARSIZE(in) - VARHDRSZ);

if (opt)
@@ -335,6 +338,11 @@
PG_FREE_IF_COPY(query, 2);
if (opt)
PG_FREE_IF_COPY(opt, 3);
+
+
+ /* prs.prstxt.words are all freed up by make_tsvector itself
+ * so don't need to free it now */
+
pfree(prs.words);
pfree(prs.startsel);
pfree(prs.stopsel);
Index: src/backend/tsearch/wparser_def.c
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/tsearch/wparser_def.c,v
retrieving revision 1.14
diff -u -r1.14 wparser_def.c
--- src/backend/tsearch/wparser_def.c 1 Jan 2008 19:45:52 -0000 1.14
+++ src/backend/tsearch/wparser_def.c 31 May 2008 15:14:16 -0000
@@ -1684,18 +1684,176 @@
return false;
}

-Datum
-prsd_headline(PG_FUNCTION_ARGS)
+static void
+mark_fragment(HeadlineParsedText *prs, int highlight, int startpos, int endpos)
{
- HeadlineParsedText *prs = (HeadlineParsedText *) PG_GETARG_POINTER(0);
- List *prsoptions = (List *) PG_GETARG_POINTER(1);
- TSQuery query = PG_GETARG_TSQUERY(2);
+ int i;
+ char *coversep = "...";
+ int coverlen = strlen(coversep);

- /* from opt + start and and tag */
- int min_words = 15;
- int max_words = 35;
- int shortword = 3;
+ for (i = startpos; i <= endpos; i++)
+ {
+ if (prs->words[i].item)
+ prs->words[i].selected = 1;
+ if (highlight == 0)
+ {
+ if (HLIDIGNORE(prs->words[i].type))
+ prs->words[i].replace = 1;
+ }
+ else
+ {
+ if (XMLHLIDIGNORE(prs->words[i].type))
+ prs->words[i].replace = 1;
+ }

+ prs->words[i].in = (prs->words[i].repeated) ? 0 : 1;
+ }
+ /* add cover separators if needed */
+ if (startpos > 0 && strncmp(prs->words[startpos-1].word, coversep,
+ prs->words[startpos-1].len) != 0)
+ {
+
+ prs->words[startpos-1].word = repalloc(prs->words[startpos-1].word, sizeof(char) * coverlen);
+ prs->words[startpos-1].in = 1;
+ prs->words[startpos-1].len = coverlen;
+ memcpy(prs->words[startpos-1].word, coversep, coverlen);
+ }
+ if (endpos-1 < prs->curwords && strncmp(prs->words[startpos-1].word, coversep,
+ prs->words[startpos-1].len) != 0)
+ {
+ prs->words[endpos+1].word = repalloc(prs->words[endpos+1].word, sizeof(char) * coverlen);
+ prs->words[endpos+1].in = 1;
+ memcpy(prs->words[endpos+1].word, coversep, coverlen);
+ }
+}
+
+static void
+mark_hl_fragments(HeadlineParsedText *prs, TSQuery query,int highlight,
+ int num_fragments, int maxcoversize)
+{
+ DocRepresentation* doc;
+ Extention ext;
+ int4 coverlen, doclen;
+ int4 startpos = 0, endpos = 0;
+ QueryRepresentation qr;
+ int4 i, f, numcovers = 0, maxcovers = 32, maxstretch;
+ int4 min, minI = 0;
+ CoverPos *covers = palloc(maxcovers * sizeof(CoverPos));
+ TSVector t = make_tsvector(&(prs->prstxt));
+
+ qr.query = query;
+ qr.operandexist = (bool*) palloc0(sizeof(bool) * query->size);
+
+ /* start generating covers for the query */
+ doc = get_docrep(t, &qr, &doclen);
+ if (!doc)
+ {
+ pfree(qr.operandexist);
+ pfree(covers);
+ /* cannot do anything */
+ return;
+ }
+
+ /* get all covers */
+ MemSet(&ext, 0, sizeof(Extention));
+ while (Cover(doc, doclen, &qr, &ext))
+ {
+ if (numcovers >= maxcovers)
+ {
+ maxcovers *= 2;
+ covers = repalloc(covers, sizeof(CoverPos) * maxcovers);
+ }
+ covers[numcovers].startpos = ext.p;
+ covers[numcovers].endpos = ext.q;
+
+ covers[numcovers].in = 0;
+ covers[numcovers].excluded = 0;
+ numcovers ++;
+ }
+ /* we do not need tsvector any more, free it */
+ if (t)
+ pfree(t);
+
+ /* choose best covers */
+ for (f = 0; f < num_fragments; f++)
+ {
+ min = 9999999;/* XXX - will not display headlines that exceed 9999999 */
+ for (i = 0; i < numcovers; i ++)
+ {
+ coverlen = covers[i].endpos - covers[i].startpos + 1;
+ if (!covers[i].in && !covers[i].excluded && min > coverlen)
+ {
+ min = coverlen;
+ minI = i;
+ }
+ }
+ if (min < 9999999)
+ {
+ covers[minI].in = 1;
+ /* adjust the size of cover
+ * if maxcoversize >= len
+ * then headline from ext.p - (maxcoversize-len)/2 to ext.q + (maxcoverSize-len) /2
+ * if maxcoverSize < len
+ * then headline from ext.p to ext.p + maxcoverSize
+ * (ensures starting lexeme is in the headline)
+ */
+ /* cut down endpos if it crosses maxWords */
+ startpos = covers[minI].startpos;
+ endpos = covers[minI].endpos;
+ coverlen = endpos - startpos + 1;
+
+ if (maxcoversize > coverlen)
+ {
+ /* stretch it to maxwords */
+ maxstretch = maxcoversize;
+
+ /* divide the stretch on both sides of cover */
+ startpos -= (maxstretch - coverlen)/2;
+ endpos += (maxstretch - coverlen)/2;
+ if (startpos < 0)
+ startpos = 0;
+ /* XXX - do we need to check whether endpos crosses the document
+ * the other function would return if the document ends or the
+ * endpos is reached.
+ * Dropping this check for time being
+ */
+ }
+ else if (maxcoversize < coverlen)
+ endpos = startpos + maxcoversize;
+ covers[minI].startpos = startpos;
+ covers[minI].endpos = endpos;
+
+ /* exclude overlapping covers */
+ for (i = 0; i < numcovers; i ++)
+ {
+ if (i != minI &&
+ (covers[i].startpos >= covers[minI].startpos &&
+ covers[i].startpos <= covers[minI].endpos))
+ covers[i].excluded = 1;
+ }
+ }
+ else
+ break;
+ }
+
+ /* Mark the chosen fragments (covers) */
+
+ for (i = 0; i < numcovers; i++)
+ {
+ if (!covers[i].in)
+ continue;
+
+ startpos = covers[i].startpos;
+ endpos = covers[i].endpos;
+
+ mark_fragment(prs, highlight, covers[i].startpos, covers[i].endpos);
+ }
+ pfree(qr.operandexist);
+ pfree(covers);
+}
+static void
+mark_hl_words(HeadlineParsedText *prs, TSQuery query, int highlight, int shortword, int min_words, int max_words)
+{
int p = 0,
q = 0;
int bestb = -1,
@@ -1707,56 +1865,9 @@
curlen;

int i;
- int highlight = 0;
- ListCell *l;
-
- /* config */
- prs->startsel = NULL;
- prs->stopsel = NULL;
- foreach(l, prsoptions)
- {
- DefElem *defel = (DefElem *) lfirst(l);
- char *val = defGetString(defel);
-
- if (pg_strcasecmp(defel->defname, "MaxWords") == 0)
- max_words = pg_atoi(val, sizeof(int32), 0);
- else if (pg_strcasecmp(defel->defname, "MinWords") == 0)
- min_words = pg_atoi(val, sizeof(int32), 0);
- else if (pg_strcasecmp(defel->defname, "ShortWord") == 0)
- shortword = pg_atoi(val, sizeof(int32), 0);
- else if (pg_strcasecmp(defel->defname, "StartSel") == 0)
- prs->startsel = pstrdup(val);
- else if (pg_strcasecmp(defel->defname, "StopSel") == 0)
- prs->stopsel = pstrdup(val);
- else if (pg_strcasecmp(defel->defname, "HighlightAll") == 0)
- highlight = (pg_strcasecmp(val, "1") == 0 ||
- pg_strcasecmp(val, "on") == 0 ||
- pg_strcasecmp(val, "true") == 0 ||
- pg_strcasecmp(val, "t") == 0 ||
- pg_strcasecmp(val, "y") == 0 ||
- pg_strcasecmp(val, "yes") == 0);
- else
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized headline parameter: \"%s\"",
- defel->defname)));
- }

if (highlight == 0)
{
- if (min_words >= max_words)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("MinWords should be less than MaxWords")));
- if (min_words <= 0)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("MinWords should be positive")));
- if (shortword < 0)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("ShortWord should be >= 0")));
-
while (hlCover(prs, query, &p, &q))
{
/* find cover len in words */
@@ -1877,6 +1988,82 @@
prs->words[i].in = (prs->words[i].repeated) ? 0 : 1;
}

+}
+
+Datum
+prsd_headline(PG_FUNCTION_ARGS)
+{
+ HeadlineParsedText *prs = (HeadlineParsedText *) PG_GETARG_POINTER(0);
+ List *prsoptions = (List *) PG_GETARG_POINTER(1);
+ TSQuery query = PG_GETARG_TSQUERY(2);
+
+ /* from opt + start and and tag */
+ int min_words = 15;
+ int max_words = 35;
+ int shortword = 3;
+ int num_fragments = 0;
+ int highlight = 0;
+ ListCell *l;
+
+ /* config */
+ prs->startsel = NULL;
+ prs->stopsel = NULL;
+ foreach(l, prsoptions)
+ {
+ DefElem *defel = (DefElem *) lfirst(l);
+ char *val = defGetString(defel);
+
+ if (pg_strcasecmp(defel->defname, "MaxWords") == 0)
+ max_words = pg_atoi(val, sizeof(int32), 0);
+ else if (pg_strcasecmp(defel->defname, "MinWords") == 0)
+ min_words = pg_atoi(val, sizeof(int32), 0);
+ else if (pg_strcasecmp(defel->defname, "ShortWord") == 0)
+ shortword = pg_atoi(val, sizeof(int32), 0);
+ else if (pg_strcasecmp(defel->defname, "NumFragments") == 0)
+ num_fragments = pg_atoi(val, sizeof(int32), 0);
+ else if (pg_strcasecmp(defel->defname, "StartSel") == 0)
+ prs->startsel = pstrdup(val);
+ else if (pg_strcasecmp(defel->defname, "StopSel") == 0)
+ prs->stopsel = pstrdup(val);
+ else if (pg_strcasecmp(defel->defname, "HighlightAll") == 0)
+ highlight = (pg_strcasecmp(val, "1") == 0 ||
+ pg_strcasecmp(val, "on") == 0 ||
+ pg_strcasecmp(val, "true") == 0 ||
+ pg_strcasecmp(val, "t") == 0 ||
+ pg_strcasecmp(val, "y") == 0 ||
+ pg_strcasecmp(val, "yes") == 0);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized headline parameter: \"%s\"",
+ defel->defname)));
+ }
+
+ if (highlight == 0)
+ {
+ if (min_words >= max_words)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("MinWords should be less than MaxWords")));
+ if (min_words <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("MinWords should be positive")));
+ if (shortword < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ShortWord should be >= 0")));
+ if (num_fragments < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("NumFragments should be >= 0")));
+
+ if (num_fragments == 0)
+ /* call the default headline generator */
+ mark_hl_words(prs, query, highlight, shortword, min_words, max_words);
+ else
+ mark_hl_fragments(prs, query, highlight, num_fragments, max_words);
+ }
if (!prs->startsel)
prs->startsel = pstrdup("<b>");
if (!prs->stopsel)
@@ -1886,3 +2073,4 @@

PG_RETURN_POINTER(prs);
}
+
Index: src/backend/utils/adt/tsrank.c
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/backend/utils/adt/tsrank.c,v
retrieving revision 1.13
diff -u -r1.13 tsrank.c
--- src/backend/utils/adt/tsrank.c 16 May 2008 16:31:01 -0000 1.13
+++ src/backend/utils/adt/tsrank.c 30 May 2008 17:44:01 -0000
@@ -17,6 +17,7 @@

#include "tsearch/ts_type.h"
#include "tsearch/ts_utils.h"
+#include "tsearch/ts_rank.h"
#include "utils/array.h"
#include "miscadmin.h"

@@ -488,14 +489,6 @@
PG_RETURN_FLOAT4(res);
}

-typedef struct
-{
- QueryItem **item;
- int16 nitem;
- uint8 wclass;
- int32 pos;
-} DocRepresentation;
-
static int
compareDocR(const void *va, const void *vb)
{
@@ -507,12 +500,6 @@
return (a->pos > b->pos) ? 1 : -1;
}

-typedef struct
-{
- TSQuery query;
- bool *operandexist;
-} QueryRepresentation;
-
#define QR_GET_OPERAND_EXISTS(q, v) ( (q)->operandexist[ ((QueryItem*)(v)) - GETQUERY((q)->query) ] )
#define QR_SET_OPERAND_EXISTS(q, v) QR_GET_OPERAND_EXISTS(q,v) = true

@@ -524,17 +511,7 @@
return QR_GET_OPERAND_EXISTS(qr, val);
}

-typedef struct
-{
- int pos;
- int p;
- int q;
- DocRepresentation *begin;
- DocRepresentation *end;
-} Extention;
-
-
-static bool
+bool
Cover(DocRepresentation *doc, int len, QueryRepresentation *qr, Extention *ext)
{
DocRepresentation *ptr;
@@ -615,7 +592,7 @@
return Cover(doc, len, qr, ext);
}

-static DocRepresentation *
+DocRepresentation *
get_docrep(TSVector txt, QueryRepresentation *qr, int *doclen)
{
QueryItem *item = GETQUERY(qr->query);
Index: src/include/tsearch/ts_public.h
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/include/tsearch/ts_public.h,v
retrieving revision 1.9
diff -u -r1.9 ts_public.h
--- src/include/tsearch/ts_public.h 16 May 2008 16:31:02 -0000 1.9
+++ src/include/tsearch/ts_public.h 31 May 2008 15:10:24 -0000
@@ -14,6 +14,7 @@
#define _PG_TS_PUBLIC_H_

#include "tsearch/ts_type.h"
+#include "tsearch/ts_utils.h"

/*
* Parser's framework
@@ -47,6 +48,7 @@

typedef struct
{
+ ParsedText prstxt;
HeadlineWordEntry *words;
int4 lenwords;
int4 curwords;
@@ -55,6 +57,24 @@
int2 startsellen;
int2 stopsellen;
} HeadlineParsedText;
+/*
+ * headline framework, flow in common to generate:
+ * 1 parse text with hlparsetext
+ * 2 parser-specific function to find part
+ * 3 generateHeadline to generate result text
+ */
+
+typedef struct
+{
+ int4 startpos;
+ int4 endpos;
+ int2 in;
+ int2 excluded;
+} CoverPos;
+
+extern void hlparsetext(Oid cfgId, HeadlineParsedText *prs, TSQuery query,
+ char *buf, int4 buflen);
+extern text *generateHeadline(HeadlineParsedText *prs);

/*
* Common useful things for tsearch subsystem
Index: src/include/tsearch/ts_rank.h
===================================================================
RCS file: src/include/tsearch/ts_rank.h
diff -N src/include/tsearch/ts_rank.h
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ src/include/tsearch/ts_rank.h 30 May 2008 17:44:01 -0000
@@ -0,0 +1,36 @@
+#ifndef __TSRANK_H__
+#define __TSRANK_H__
+
+#include "ts_type.h"
+#include "ts_cache.h"
+
+typedef struct
+{
+ QueryItem **item;
+ int16 nitem;
+ uint8 wclass;
+ int32 pos;
+} DocRepresentation;
+
+typedef struct
+{
+ TSQuery query;
+ bool *operandexist;
+} QueryRepresentation;
+
+typedef struct
+{
+ int pos;
+ int p;
+ int q;
+ DocRepresentation *begin;
+ DocRepresentation *end;
+} Extention;
+
+bool
+Cover(DocRepresentation *doc, int len, QueryRepresentation *qr, Extention *ext);
+
+DocRepresentation *
+get_docrep(TSVector txt, QueryRepresentation *qr, int *doclen);
+
+#endif /* __TSRANK_H__ */
Index: src/include/tsearch/ts_utils.h
===================================================================
RCS file: /home/sushant/devel/pgsql-cvs/pgsql/src/include/tsearch/ts_utils.h,v
retrieving revision 1.15
diff -u -r1.15 ts_utils.h
--- src/include/tsearch/ts_utils.h 16 May 2008 16:31:02 -0000 1.15
+++ src/include/tsearch/ts_utils.h 30 May 2008 23:18:08 -0000
@@ -13,7 +13,7 @@
#define _PG_TS_UTILS_H_

#include "tsearch/ts_type.h"
-#include "tsearch/ts_public.h"
+#include "tsearch/ts_rank.h"
#include "nodes/pg_list.h"

/*
@@ -90,16 +90,6 @@

extern void parsetext(Oid cfgId, ParsedText *prs, char *buf, int4 buflen);

-/*
- * headline framework, flow in common to generate:
- * 1 parse text with hlparsetext
- * 2 parser-specific function to find part
- * 3 generateHeadline to generate result text
- */
-
-extern void hlparsetext(Oid cfgId, HeadlineParsedText *prs, TSQuery query,
- char *buf, int4 buflen);
-extern text *generateHeadline(HeadlineParsedText *prs);

/*
* Common check function for tsvector @@ tsquery
I have attached a new patch with respect to the current cvs head. This
produces headline in a document for a given query. Basically it
identifies fragments of text that contain the query and displays them.

DESCRIPTION

HeadlineParsedText contains an array of actual words but not
information about the norms. We need an indexed position vector for each
norm so that we can quickly evaluate a number of possible fragments.
Something that tsvector provides.

So this patch changes HeadlineParsedText to contain the norms
(ParsedText). This field is updated while parsing in hlparsetext. The
position information of the norms corresponds to the position of words
in HeadlineParsedText (not to the norms positions as is the case in
tsvector). This works correctly with the current parser. If you think
there may be issues with other parsers please let me know.

This approach does not change any other interface and fits nicely with
the overall framework.

The norms are converted into tsvector and a number of covers are
generated. The best covers are then chosen to be in the headline. The
covers are separated using a hardcoded coversep. Let me know if you want
to expose this as an option.

Covers that overlap with already chosen covers are excluded.

Some options like ShortWord and MinWords are not taken care of right
now. MaxWords are used as maxcoversize. Let me know if you would like to
see other options for fragment generation as well.

Let me know any more changes you would like to see.

-Sushant.

On Tue, 2008-05-27 at 13:30 +0400, Teodor Sigaev wrote:
> Hi!
>
> > 1. Why is hlparsetext used to parse the document rather than the
> > parsetext function? Since words to be included in the headline will be
> > marked afterwords, it seems more reasonable to just use the parsetext
> > function.
> > The main difference I see is the use of hlfinditem and marking whether
> > some word is repeated.
> hlparsetext preserves any kind of lexeme - not indexed, spaces etc. parsetext
> doesn't.
> hlparsetext preserves original form of lexemes. parsetext doesn't.
>
> >
> > The reason this is important is that hlparsetext does not seem to be
> > storing word positions which parsetext does. The word positions are
> > important for generating headline with fragments.
> Doesn't needed - hlparsetext preserves the whole text, so, position is a number
> of array.
>
> >
> > 2.
> >> I would prefer the signature ts_headline( [regconfig,] text, tsquery
> >> [,text] )and function should accept 'NumFragments=>N' for default
> >> parser. Another parsers may use another options.
> >
> > Does this mean we want a unified function ts_headline and we trigger the
> > fragments if NumFragments is specified?
>
> Trigger should be inside parser-specific function (pg_ts_parser.prsheadline).
> Another parsers might not recognize that option.
>
> > It seems that introducing a new
> > function which can take configuration OID, or name is complex as there
> > are so many functions handling these issues in wparser.c.
> No, of course - ts_headline takes care about finding configuration and calling
> correct parser.
>
> >
> > If this is true then we need to just add marking of headline words in
> > prsd_headline. Otherwise we will need another prsd_headline_with_covers
> > function.
> Yeah, pg_ts_parser.prsheadline should mark the lexemes to. It even can change
> an array of HeadlineParsedText.
>
> >
> > 3. In many cases people may already have TSVector for a given document
> > (for search operation). Would it be faster to pass TSVector to headline
> > function when compared to computing TSVector each time? If that is the
> > case then should we have an option to pass TSVector to headline
> > function?
> As I mentioned above, tsvector doesn;t contain whole information about text.
>

Re: [HACKERS] Overhauling GUCS

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


In the interest of constructive criticism, here's some
ways I think postgresql.conf could be improved.

* A weak +1 to the #include idea. I'm much more inclined to simply
add a new section to the bottom of the file and use version
control, but I could see ways in which include would be useful.


* Rearrange the settings to put the more "common" ones at
the top, even if it means messing up the categories a bit.
Having 'bonjour_name' at the top, and 'default_statistics_target'
buried way down below is crazy. The sections should be
looked at from a clean perspective: do we need them at all? Are their
better ways to arrange things? What order should they be in?


* Much more verbose comments. The abovementioned default_statistics_target
is a very important settings, but there is zero explanation in the file
of what it is. The only thing we're told is that it ranges from 10 - 1000.
We can do better than that. Users would absolutely love it if each item
had a clear explanation, and it would be well worth a slightly increased
file size. See the postfix main.cf file for a good example of such.


* Remove the confusing "commented out is default" bit entirely. Simply
set each value explicitly. Why should new users have to confront this?:

# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.


* Lose the tabbed indenting, which looks bad, especially for multi-line
comments. Just use spaces.


* Create a tool, or at least a best practices, for controlling and tracking
changes to the file.


* Put some doc URLs in the file, at the minimum per major section. At the
very bare minimum, a real URL at the top.


* Indicate which values can be changed per-session or per-role.


* Fix the disparity between the syntax in the file and the SET interface.
For example, work_mem = 16MB works in the conf file, but you have to write
SET work_mem = '16MB'. Easiest is probably just to quote everything in the conf.


* Lose the post-value, end-of-line comments, they just get in the way when making
changes and make the file harder to read by contributing to the wrap problem.


* I'm tempted by the argument of creating a separate file for the obscure
settings, but I think it would be too much pain, and nobody would ever agree on
which settings are 'important' and which are 'obscure'.


* It might be nice to mention other ways to reload the file, such as
'service postgresql reload', or whatever Windows uses.


* The word 'paramters' is still misspelled. :)


* That whole sentence about changing the parameters as command-line
options needs to go away.


* Since the executable is now named "postgres" (thank goodness we got
rid of "postmaster"), the file should be named 'postgres.conf'. This would
also be a way to quickly distinguish 'old' vs 'new' style conf files if
we end up making major changes to it.


- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200805311911
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkhB39sACgkQvJuQZxSWSshahQCg4V5QsO34HOhUDoPzT7STcR45
V5UAoPQxkmuk/oCYirTKxMAhV+Kh8Ytz
=7Lgk
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

On May 31, 2008, at 15:32, Peter Eisentraut wrote:

>> 1. Most people have no idea how to set these.
>
> Could you clarify this? I can't really believe that people are
> incapable of
> editing a configuration file.

I've been using PostgreSQL on and off, mostly on, for almost 10 years.
I still have no idea what 75% of those settings in postgresql.conf
mean or are for. There are an overwhelming number of them. I know that
5-8 of them I always touch, thanks largely to assistance now and then
from Josh Berkus, but the rest are just complexity to me. I don't
doubt that the vast majority of them are useful in one situation or
another, but unless I'm in one of those situations, I really don't
need to see them there and be confused by them.

Does that help?

>> 2. The current postgresql.conf file is a huge mess of 194 options,
>> the
>> vast majority of which most users will never touch.
>
> My opinion has always been that we should provide a default file
> with only the
> essential options instead of all of them. I see this as a the major
> problem,
> because people are overwhelmed and consequently don't set anything.

+1

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

Josh Berkus wrote:
> Currently, PostgreSQL,conf and our set of GUC configurations suffer from
> 4 large problems:

As we have talked about it before, you know that I agree that the GUC system
could use some improvements. But I'm a bit surprised about some of your
assessments.

> 1. Most people have no idea how to set these.

Could you clarify this? I can't really believe that people are incapable of
editing a configuration file.

> 2. The current postgresql.conf file is a huge mess of 194 options, the
> vast majority of which most users will never touch.

My opinion has always been that we should provide a default file with only the
essential options instead of all of them. I see this as a the major problem,
because people are overwhelmed and consequently don't set anything.

> 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf,
> and the settings.sgml), which are only synched with each other manually.

While this is not ideal, I can't really see who this is a major problem, at
least from the perspective of the user.

> 4. We don't seem to be getting any closer to autotuning.

True. But how does your proposal address this?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

On May 31, 2008, at 12:36, Gregory Stark wrote:

> What this sounds like is a sly way to try to get rid of
> postgresql.conf
> entirely and replace it with parameters stored in the database so
> admins would
> adjust the parameters using an SQL syntax rather than a text file.
>
> There are pros and cons of such a system but I think for newbie
> admins that
> would be a thousand times *more* baffling. You would have to learn new
> commands and have no holistic view of what parameters had been set,
> what
> related parameters might exist. You also have no way to keep the
> file in a
> version control system or sync across servers etc.

FWIW, this has not been a barrier to MySQL adoption.

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PERFORM] 2GB or not 2GB

"Josh Berkus" <josh@agliodbs.com> writes:

> Simon,
>
>> There is an optimum for each specific sort.
>
> Well, if the optimum is something other than "as much as we can get", then we
> still have a pretty serious issue with work_mem, no?

With the sort algorithm. The problem is that the database can't predict the
future and doesn't know how many more records will be arriving and how out of
order they will be.

What appears to be happening is that if you give the tape sort a large amount
of memory it keeps a large heap filling that memory. If that large heap
doesn't actually save any passes and doesn't reduce the number of output tapes
then it's just wasted cpu time to maintain such a large heap. If you have any
clever ideas on how to auto-size the heap based on how many output tapes it
will create or avoid then by all means speak up.

--
Gregory Stark
EnterpriseDB

http://www.enterprisedb.com

Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] Overhauling GUCS

"Josh Berkus" <josh@agliodbs.com> writes:

> It's my viewpoint based on a lot of user feedback that the current
> postgresql.conf is fundamentally broken and a major roadblock to PostgreSQL
> adoption. This was a point with which there was pretty much universal
> agreement when I talked with people at pgCon.

Actually as a new DBA when I was first starting out with Postgres I found it
very convenient to have all the common parameters in one place where I could
just uncomment and adjust them. Instead of having to search through
documentation and find the default value from which

> 1 & 2) by not having the settings be defined in a 500-line file, new users
> would no longer be baffled by scores of settings which probably don't concern
> them, trying to find the handful of settings which do.

I'm not sure how an empty file is any less "baffling" than one listing the
default value for parameters they don't need yet.

> 3) We'd consolidate the GUC lists down from 3 places to 2, which is one less
> area to synchronize. Magnus and I looked to see if it might be possible to
> generate the docs from the same list, but it's not practical.

This seems like a trivial gain and one which is unlikely to outweigh the pain
of having to massage the info into C data structures.

> 4) By shifting from a model where postgresql.conf is document-formatted and
> hand-edited to one where it's machine generated, it becomes vastly easier to
> write simple utilities to manage these settings. Right now, the big
> "obstacle" to things like SET PERSISTENT is "how to we preseve the
> hand-edited comments in the file" -- and the answer is we *don't.*

What this sounds like is a sly way to try to get rid of postgresql.conf
entirely and replace it with parameters stored in the database so admins would
adjust the parameters using an SQL syntax rather than a text file.

There are pros and cons of such a system but I think for newbie admins that
would be a thousand times *more* baffling. You would have to learn new
commands and have no holistic view of what parameters had been set, what
related parameters might exist. You also have no way to keep the file in a
version control system or sync across servers etc.

> Have you *looked* at postgresql.conf.sample lately, Tom? It's a disaster.
> Maintenance is already difficult, and becoming more so as we add settings.

What difficulties?

--
Gregory Stark
EnterpriseDB

http://www.enterprisedb.com

Ask me about EnterpriseDB's 24x7 Postgres support!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

"David E. Wheeler" <david@kineticode.com> writes:

> I'd love to see these issues resolved. The current postgresql.conf is way over
> the top. Might you have a better idea?

I don't think fiddling with surface issues like the formatting of the
postgresql.conf is productive. Hiding parameters because you don't think
beginners need them is only going to frustrate those people who do need to
adjust them.

What might be productive is picking up a group of parameters and thinking hard
about what they mean in terms of user-visible real-world effects. If they can
be recast in terms of behaviour the user wants instead of internal
implementation details then that would translate into a massive simplification
as well as being easier to explain to users.

I think we do a pretty good job of this already. Witness things like
effective_cache_size -- imagine if this were "nested_loop_cache_hit_rate" for
example, good luck figuring out what to set it to.

The vacuum cost delay factors are probably ripe for such a recast though. I
think we need just one parameter "vacuum_io_bandwidth" or something like that.
The bgwriter parameters might also be a candidate but I'm less certain.

--
Gregory Stark
EnterpriseDB

http://www.enterprisedb.com

Ask me about EnterpriseDB's PostGIS support!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] Converting empty input strings to Nulls

On Sat, 2008-05-31 at 13:40 -0400, Ken Winter wrote:
> I'm looking for a way to trap this bad input at the database level, quietly
> convert the input empty strings to Null, and store the Null in the date
> column. I tried a BEFORE INSERT OR UPDATE trigger evoking this function ...

You can replace your table with a view and use rules to transform the
updates and inserts.

Regards,
Jeff Davis


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [PERFORM] 2GB or not 2GB

Simon,

> There is an optimum for each specific sort.

Well, if the optimum is something other than "as much as we can get", then we
still have a pretty serious issue with work_mem, no?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [GENERAL] Converting empty input strings to Nulls

Pavel Stehule wrote:
> Hello
>
> 2008/5/31 Ken Winter <ken@sunward.org>:
>> ... but an empty string still evokes the error even before this function is
>> triggered.
>>
>> Is there a way to convert empty strings to Nulls before the error is evoked?
>>
> no - it's not possible. And some "magic" fix in triggers is bad style.

The rule system may be able to handle this transformation (using insert
rules or an updateable view). Getting it to work could be a bit arcane,
though, and having never really delved into the rule system I can't be
of much help.

See:

http://www.postgresql.com.cn/docs/8.3/static/rules.html

http://www.postgresql.com.cn/docs/8.3/static/sql-createrule.html

http://wiki.postgresql.org/wiki/Updatable_views

http://wiki.postgresql.org/wiki/Introduction_to_PostgreSQL_Rules_-_Making_entries_which_can't_be_altered

Personally, though, if at all possible I'd fix the broken client
application. "" is NOT NULL . If you don't fix it, consider at least
clearly documenting the wacky behaviour and if possible applying it only
to an updatable view rather than the base table.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [pgsql-advocacy] Feedback on blog post about Replication Feature decision and its impact

Dirk,

> <a href="http://www.enterprisedb.com/">EnterpriseDB</a> is a well-funded
> database startup whose product builds on PostgreSQL. EnterpriseDB adds
> many "enterprise-readiness" features to the basic PostgreSQL product,
> including database replication, and much more.

The replication-in-core vs. not-in-core has absolutely nothing to do with
EnterpriseDB either way. I think you'd be doing a disservice to your readers
by implying that it does. Or with the GPL. If you want to blog about these
things, maybe break them up into seperate posts?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

--
Sent via pgsql-advocacy mailing list (pgsql-advocacy@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-advocacy

Re: [HACKERS] Overhauling GUCS

Simon, Tom, Greg,

> Now, that will get a lot easier just by
> virtue of having a smaller config file, but I think that adding something
> into pg_settings that allows saving user-added commentary would be a nice
> step toward some useful standardization on that side of things. It would
> make future automated tools aimed at parsing and generating new files, as
> part of things like version upgrades, a lot easier if there was a standard
> way such comments were handled in addition to the raw data itself.

Hmmm. What about a COMMENT ON SETTING? That seems like it would serve the
purpose ... and make preserving user comments easier than the current
file-conversion approach.

> The other thing I'd like to see make its way into pg_settings, so that
> tools can operate on it just by querying the database, is noting what file
> the setting came from so that you can track things like include file
> usage. I think with those two additions (comments and source file
> tracking) it would even be concievable to clone a working facsimile of
> even a complicated postgresql.conf file set remotely just by reading
> pg_settings.

Hmmm. Any idea how to do this? It sounds like a good idea to me.

> So my viewpoint is that we should be aggressively adding new features,
> yet be conservative in changing existing behaviour: provide options for
> behaves-like-previous-release and keep the administrative interfaces as
> similar as possible between releases.

It's my viewpoint based on a lot of user feedback that the current
postgresql.conf is fundamentally broken and a major roadblock to PostgreSQL
adoption. This was a point with which there was pretty much universal
agreement when I talked with people at pgCon. That is, I beleive that you're
arguing for keeping .conf stable for the 5% of users who understand it and
ignoring the 95% of users who are baffled by it.

At this point, I think we are the only major SQL database without some form of
basic autotuning for the most significant settings (certainly MySQL, Oracle,
and SQL Server have it); we've been able to coast through that because
autotuning features are new in other DBs, but it's going to start to hurt us
pretty soon.

Now, I do believe that we should plan any GUCS overhaul to happen in one
postgresql version rather than phasing it in over multiple versions so that
administrators only need to get used to a new way once.

> > Currently, PostgreSQL,conf and our set of GUC configurations suffer from
> > 4 large problems:
> >
> > 1. Most people have no idea how to set these.
> > 2. The current postgresql.conf file is a huge mess of 194 options, the
> > vast majority of which most users will never touch.
> > 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf,
> > and the settings.sgml), which are only synched with each other manually.
> > 4. We don't seem to be getting any closer to autotuning.
>
> The proposal doesn't actually solve any of those problems.

OK, let me draw some lines:
1 & 2) by not having the settings be defined in a 500-line file, new users
would no longer be baffled by scores of settings which probably don't concern
them, trying to find the handful of settings which do.

3) We'd consolidate the GUC lists down from 3 places to 2, which is one less
area to synchronize. Magnus and I looked to see if it might be possible to
generate the docs from the same list, but it's not practical.

4) By shifting from a model where postgresql.conf is document-formatted and
hand-edited to one where it's machine generated, it becomes vastly easier to
write simple utilities to manage these settings. Right now, the big
"obstacle" to things like SET PERSISTENT is "how to we preseve the
hand-edited comments in the file" -- and the answer is we *don't.*

> I disagree with doing any of this. It doesn't result in any useful
> reduction in maintenance effort, and what it does do is make it
> impossible to keep control over the detailed layout, formatting,
> commenting etc in a sample postgresql.conf.

Have you *looked* at postgresql.conf.sample lately, Tom? It's a disaster.
Maintenance is already difficult, and becoming more so as we add settings.

Further, you and Simon seem to think that the current "narrative docs inside
the conf file" format has some kind of value which makes things easier for
DBAs. I don't believe it does, and I have yet to meet a *single* new
PostgreSQL user who wasn't confused and intimidated by the file.

> Nor do I think that
> "generate a whole config file from scratch" is going to be a useful
> behavior for tuning problems --- how will you merge it with what
> you had before?

Who's merging? I don't think you get the proposal. The whole "hand-edited"
approach to postgresql.conf should go away. It's not useful, it's not
educational, and it doesn't make PostgreSQL easy to manage.

Further, the lack of an easy way to manage settings via port access to
PostgreSQL is a significant inhibitor to adopting PostgreSQL in environments
with large numbers of servers. See prior arguments by the CPANEL folks about
why they don't support PostgreSQL, which is in turn a major reason why
PostgreSQL web hosting is hard to find.

I agree that editing the data about settings in the guc.c file is not ideal;
Magnus and I discussed that mainly because we wanted to preserve the
translation framework with gettext strings. If someone can think of a better
way to do this part, I'm all ears.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] Converting empty input strings to Nulls

Hello

2008/5/31 Ken Winter <ken@sunward.org>:
> Applications accessing my PostgreSQL 8.0 database like to submit no-value
> date column values as empty strings rather than as Nulls. This, of course,
> causes this PG error:
>
> SQL State: 22007
> ERROR: invalid input syntax for type date: ""
>
> I'm looking for a way to trap this bad input at the database level, quietly
> convert the input empty strings to Null, and store the Null in the date
> column. I tried a BEFORE INSERT OR UPDATE trigger evoking this function ...
>
> CREATE OR REPLACE FUNCTION "public"."empty_string_to_null"()
> RETURNS trigger AS
> $BODY$
> BEGIN
> IF CAST(NEW.birth_date AS text) = '' THEN
> NEW.birth_date = Null;
> END IF;
> RETURN NEW;
> END;
> $BODY$
> LANGUAGE 'plpgsql' VOLATILE;
>
> ... but an empty string still evokes the error even before this function is
> triggered.
>
> Is there a way to convert empty strings to Nulls before the error is evoked?
>
no - it's not possible. And some "magic" fix in triggers is bad style.

you can write own custom type (not in plpgsql) that allows this
behave. You can copy and modify postgre's DateADT implementation.
pgsql/src/backend/utils/adt/date.c

Datum
date_in(PG_FUNCTION_ARGS)
{
char *str = PG_GETARG_CSTRING(0);
DateADT date;
fsec_t fsec;
struct pg_tm tt,
*tm = &tt;
int tzp;
int dtype;
int nf;
int dterr;
char *field[MAXDATEFIELDS];
int ftype[MAXDATEFIELDS];
char workbuf[MAXDATELEN + 1];

// your hack
if (strlen(str) == 0)
PG_RETURN_NULL();

dterr = ParseDateTime(str, workbuf, sizeof(workbuf),
field, ftype,
MAXDATEFIELDS, &n
if (dterr == 0)
dterr = DecodeDateTime(field, ftype, nf, &dtype, tm,
&fsec, &tzp)
if (dterr != 0)
DateTimeParseError(dterr, str, "date");

switch (dtype)


more: http://www.postgresql.org/docs/8.3/interactive/xtypes.html

I am not sure if in function can return NULL.You should to test it

regards
Pavel Stehule

> ~ TIA
> ~ Ken
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Overhauling GUCS

On May 31, 2008, at 09:23, Tom Lane wrote:

>> 1. Most people have no idea how to set these.
>> 2. The current postgresql.conf file is a huge mess of 194 options,
>> the
>> vast majority of which most users will never touch.
>> 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf,
>> and the settings.sgml), which are only synched with each other
>> manually.
>> 4. We don't seem to be getting any closer to autotuning.
>
> The proposal doesn't actually solve any of those problems.

It solves #2 at least.

> I disagree with doing any of this. It doesn't result in any useful
> reduction in maintenance effort, and what it does do is make it
> impossible to keep control over the detailed layout, formatting,
> commenting etc in a sample postgresql.conf. Nor do I think that
> "generate a whole config file from scratch" is going to be a useful
> behavior for tuning problems --- how will you merge it with what
> you had before?

I'd love to see these issues resolved. The current postgresql.conf is
way over the top. Might you have a better idea?

Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal - Collation at database level

Radek Strnad napsal(a):
> Zdenek Kotala wrote:
>> Radek Strnad napsal(a):
>>
>> <snip>
>>
>>>
>>> I'm thinking of dividing the problem into two parts - in beginning
>>> pg_collation will contain two functions. One will have hard-coded rules
>>> for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT,
>>> UCS_BASIC). It will compare each string character bitwise and guarantee
>>> that the implementation will meet the SQL standard implemented in
>>> PostgreSQL.
>>> Second one will allow the user to use installed system locales. The set
>>> of these collations will obviously vary between systems. Catalogs will
>>> contain encoding and collation for calling the system locale function.
>>> This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592
>>> etc. if they will be availible.
>>>
>>> We will also need to change the way how strings are compared. Regarding
>>> the set database collation the right function will be used.
>>> http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675

>>>
>>>
>>> This design will make possible switch to ICU or any other implementation
>>> quite simple and will not cause any major rewriting of what I'm coding
>>> right now.
>>
>>
>> Collation function is main point here. How you mentioned one will be
>> only wrapper about strcmp and second one about strcoll. (maybe you
>> need four - char/wchar) Which function will be used it is defined in
>> pg_collation catalog by CREATE COLLATION command. But you need specify
>> name of locale for system locales. It means you need attribute for
>> storing locale name.
>>
> You're right. I've extended pg_collation for system locale columns. In
> the first stage we actually don't need any other catalogs such as
> encoding, etc. and we can build this functionality only on following
> pg_collation catalog. Used collation function (system or built-in) will
> be decided on existing collation name.
>
> CATALOG(pg_collations, ###)
> {
> NameData colname; /* collation name */
> Oid colschema; /* collation schema */
> NameData colcharset; /* character set specification */
> Oid colexistingcollation; /* existing collation */
> bool colpadattribute; /* pad attribute */
> bool colcasesensitive; /* case sensitive */
> bool colaccent; /* accent sensitive */
> NameData colsyslccollate; /* lc_collate */
> NameData colsyslcctype; /* lc_ctype */
> regproc colfunc; /* used collation function */
> } FormData_pg_collations;
>
>
>>> FormData_pg_collations;
>> It would be good to send list of new and modified SQL commands (like
>> CREATE COLLATION) for wide discussion.
>>
> CREATE COLLATION <collation name> FOR <character set specification> FROM
> <existing collation name> [ <pad characteristic> ] [ <case sensitive> ]
> [ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ]
>
> <pad characteristic> := NO PAD | PAD SPACE
> <case sensitive> := CASE SENSITIVE | CASE INSENSITIVE
> <accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE
>
> Since you can specify order by in select clause there's no need for
> adding ascending and descending type of collation. They will allways be
> ascending.
>
> DROP COLLATION <collation name>
>
> CREATE DATABASE ... [ COLLATE <collation name> ] ...
>
> ALTER DATABASE ... [ COLLATE <collation name> ] ...
>
>

I think catalog is good. Maybe attributes names colsyslccollate and
colsyslcctype should be more generic because they could be shared with ICU.

But collation function should be specified in CREATE COLLATION command.
Maybe CREATE COLLATION .... [STRCOL <fn name>]


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[GENERAL] Converting empty input strings to Nulls

Applications accessing my PostgreSQL 8.0 database like to submit no-value
date column values as empty strings rather than as Nulls. This, of course,
causes this PG error:

SQL State: 22007
ERROR: invalid input syntax for type date: ""

I'm looking for a way to trap this bad input at the database level, quietly
convert the input empty strings to Null, and store the Null in the date
column. I tried a BEFORE INSERT OR UPDATE trigger evoking this function ...

CREATE OR REPLACE FUNCTION "public"."empty_string_to_null"()
RETURNS trigger AS
$BODY$
BEGIN
IF CAST(NEW.birth_date AS text) = '' THEN
NEW.birth_date = Null;
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;

... but an empty string still evokes the error even before this function is
triggered.

Is there a way to convert empty strings to Nulls before the error is evoked?

~ TIA
~ Ken


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Packages in oracle Style

Hello

2008/5/31 <coutinho@mondriantecnologia.com>:
>
> I have implemented a solution for global variables implemented in plpython and I need implement this in c :)
>
> My implementation:
>
> select set_session('USER','coutinho');
> select set_session('EMAIL','coutinho@mondriantecnologia.com');
>
> select get_session('USER');
>> coutinho
>
> select get_session('EMAIL');
>> coutinho@mondriantecnologia.com
>
>

this is too simple :( data are stored in text format, not in native Datum format

Regards
Pavel Stehule

> On Wed, 28 May 2008 22:13:31 +0200, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>> 2008/5/27 Zdenek Kotala <Zdenek.Kotala@sun.com>:
>>> Coutinho napsal(a):
>>>>
>>>> this is listed on TODO:
>>>> http://www.postgresql.org/docs/faqs.TODO.html
>>>>
>>>> Add features of Oracle-style packages (Pavel)
>>>>
>>
>> My last idea was only global variables for plpgsql. It needs hack of
>> plpgsql :(. But it's can be simple work.
>>
>> Pavel
>>
>>>
>>> I see. Sorry I overlooked it. I think Pavel Stehule will help you. He
>> has
>>> idea how to do it.
>>>
>>> Zdenek
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [BUGS] what are the ways to avoid --- "ERROR: EXECUTE of SELECT ... INTO is not implemented yet"


Here is my real use case

1) We have nornal SELECT ...INTO calls in the procedure calls
2) However we landed ourselves with the following problem
http://www.postgresql.org/docs/faqs.FAQ.html#item4.19

<snip>

4.19) Why do I get "relation with OID ##### does not exist" errors when accessing temporary tables in PL/PgSQL functions?

In PostgreSQL versions < 8.3, PL/PgSQL caches function scripts, and an unfortunate side effect is that if a PL/PgSQL function accesses a temporary table, and that table is later dropped and recreated, and the function called again, the function will fail because the cached function contents still point to the old temporary table. The solution is to use EXECUTE for temporary table access in PL/PgSQL. This will cause the query to be reparsed every time.

This problem does not occur in PostgreSQL 8.3 and later.

</snip>

3) So the solution we are trying to implement is the EXECUTE command one.

This is causing the SELECT ...INTO problem


Upgrade is not an option :(

Currently the only available solution is
Use a temporary table where we write the local variable and make it read from the table .

Any alternatives ?

~
Shantanu

On Sat, May 31, 2008 at 10:13 AM, Craig Ringer <craig@postnewspapers.com.au> wrote:
Shantanu wrote:

ERROR:  EXECUTE of SELECT ... INTO is not implemented yet

PostgreSQL 8.3 includes support for EXECUTE ... INTO, so you may want to look at an upgrade.

Personally I wouldn't call this a bug at all. It's just that PostgreSQL knew how to parse that query before the support for actually executing it was written.

--
Craig Ringer

Re: [BUGS] what are the ways to avoid --- "ERROR: EXECUTE of SELECT ... INTO is not implemented yet"

Here is my real use case

1) We have nornal SELECT ...INTO calls in the procedure calls
2) However we landed ourselves with the following problem
http://www.postgresql.org/docs/faqs.FAQ.html#item4.19

<snip>

4.19) Why do I get "relation with OID ##### does not exist" errors when accessing temporary tables in PL/PgSQL functions?

In PostgreSQL versions < 8.3, PL/PgSQL caches function scripts, and an unfortunate side effect is that if a PL/PgSQL function accesses a temporary table, and that table is later dropped and recreated, and the function called again, the function will fail because the cached function contents still point to the old temporary table. The solution is to use EXECUTE for temporary table access in PL/PgSQL. This will cause the query to be reparsed every time.

This problem does not occur in PostgreSQL 8.3 and later.

</snip>

3) So the solution we are trying to implement is the EXECUTE command one.

This is causing the SELECT ...INTO problem


Upgrade is not an option :(

Currently the only available solution is
Use a temporary table where we write the local variable and make it read from the table .

Any alternatives ?

~
Shantanu


On Sat, May 31, 2008 at 8:38 AM, <tomas@tuxteam.de> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, May 30, 2008 at 09:57:49PM +0530, Shantanu wrote:
> Hello experts,
>
>
> I am facing this error.
>
> mydb=> select version();
>                                             version
>
> --------------------------------------------------------------------------------
>
>  PostgreSQL 8.1.9
>
> (1 row)
>
> mydb=> \i /tmp/test.sql
> CREATE FUNCTION
> mydb=> select sp_test();
> ERROR:  EXECUTE of SELECT ... INTO is not implemented yet
> CONTEXT:  PL/pgSQL function "sp_test" line 4 at execute statement
>
>
> mydb=>
> [1]+  Stopped                 su - dbu
> mymachine<root># cat /tmp/test.sql
> create or replace function sp_test() RETURNS void as $$
> declare
>         l_var timestamp;
> begin
> execute 'SELECT NOW() INTO l_var';
 ^^^^^^^^^^^^^^^^^^

Why do you need execute '...' at all?  Why not directly do instead

 SELECT NOW() into l_var;

What is your "real" use case?

Note that if you really need the dynamic command, you might put its
result int a variable, like so:

 EXECUTE 'SELECT NOW()' INTO l_var;

...but if we don't know what you are trying to achieve, it's difficult
to provide meaningful recommendations.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIQMEjBcgs9XrR2kYRAnaWAJ9Mpu/qvqzi911yiftbe3lLYZiyLgCfcWIl
TKs78mgZiKFRZcHnmHMQnOw=
=I2iD
-----END PGP SIGNATURE-----

Re: [pgsql-es-ayuda]

El día 31 de mayo de 2008 11:50, kervin licett <kerljose@gmail.com> escribió:
> Necesito ayuda porque mi tesis voy a trabajar en postgres y quiero trabajar
> con phppgadmin y de verdad cuando me conecto al servidor no reconoce el
> usuario y password que tiene portgres cuando lo instalo desde synaptic por
> defecto cual es la clave
>

por defecto no tiene ninguna password, asumo que usas una distro con
sabor a Debian, si es ubuntu en una consola digita
#sudo su - postgres
#psql

con esto entraras al interprete psql de postgresql asi sabras que tu
instalación de postgresql esta correcta y solo te faltara editar el
archivo pg_hba.conf para poder usar phppgadmin.

saludos.-
--
TIP 7: no olvides aumentar la configuración del "free space map"

Re: [HACKERS] Overhauling GUCS

Josh Berkus <josh@agliodbs.com> writes:
> Currently, PostgreSQL,conf and our set of GUC configurations suffer from
> 4 large problems:

> 1. Most people have no idea how to set these.
> 2. The current postgresql.conf file is a huge mess of 194 options, the
> vast majority of which most users will never touch.
> 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf,
> and the settings.sgml), which are only synched with each other manually.
> 4. We don't seem to be getting any closer to autotuning.

The proposal doesn't actually solve any of those problems.

> Here's a list of the things we want to change. It's all a package and
> should make sense if you take all the changes as a whole.

> 1) Add several pieces of extra information to guc.c in the form of extra
> "gettext" commands: default value, subcategory, long description,
> recommendations, enum lists.
> 2) Incorporate this data into pg_settings
> 3) Delete postgresql.conf.sample
> 4) Add a script called pg_generate_conf to generate a postgresql.conf
> based on guc.c and command-line switches (rather than
> postgresql.conf.sample), which would be called automatically by initdb.

I disagree with doing any of this. It doesn't result in any useful
reduction in maintenance effort, and what it does do is make it
impossible to keep control over the detailed layout, formatting,
commenting etc in a sample postgresql.conf. Nor do I think that
"generate a whole config file from scratch" is going to be a useful
behavior for tuning problems --- how will you merge it with what
you had before?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [BUGS] BUG #4216: Index scan goes wrong with crosstype comparison and between in one statement

"Sokolov Yura aka "funny_falcon"" <funny.falcon@gmail.com> writes:
> I've created a new type - time_interval, define operators on it and
> timestamp, add those operators into OPERATOR FAMILY datetime_ops USING
> btree;

> When I query a table using BETWEEN and equality timestamp = time_interval
> (which means timestamp included in time_interval) then statement gives
> strange results.

I think your operator class is broken. You can't just invent some
randomly-defined operators that work sort of like a normal scalar
ordering and then expect btree indexes to work. In particular,
this set of operators violates the transitive law when taken together
with regular timestamp equality: for two timestamps T1 and T2 and
some interval TI, we can have T1 = TI and T2 = TI from which it
should follow that T1 = T2, yet that does not follow. But by putting
this notion of "equality" into the same opfamily as timestamp equality,
you are promising the system that it *does* follow. I didn't bother
digging through the code to see exactly where that assumption comes into
play, but that's basically what's biting you.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs