Thursday, July 17, 2008

Re: [HACKERS] TABLE-function patch vs plpgsql

Hello

The core of problems is in standard that doesn't know RETURN NEXT
statement and knows only RETURN TABLE statement - so PL/pgPSM or SQL
doesn't have problem. I am not sure about PL/pgSQL, but I thing so
using TABLE attribs as OUT variables is maybe too simple solution -
there isn't any progress to current state, and where OUT variables are
typically source of mistakes.

Maybe we can use some well defined implicit record, maybe NEW (or
RESULT, ROW_RESULT, ROW, TABLE_ROW, ...) like trigger - some like

create or replace function foo(i int) returns table(a int, b int) as $$
begin
for j in 1..i loop
new.a := j; new.b := j+1;
return next new; -- maybe only RETURN NEXT???
end loop;
end;
$$ language plpgsql

Regards
Pavel Stehule

2008/7/18 Tom Lane <tgl@sss.pgh.pa.us>:
> I've been working on the TABLE-function patch, and I am coming to the
> conclusion that it's really a bad idea for plpgsql to not associate
> variables with output columns --- that is, I think we should make
> RETURNS TABLE columns semantically just the same as OUT parameters.
> Here are some reasons:
>
> 1. It's ludicrous to argue that "standards compliance" requires the
> behavior-as-submitted. plpgsql is not specified by the SQL standard.
>
> 2. Not having the parameter names available means that you don't have
> access to their types either, which is a big problem for polymorphic
> functions. Read the last couple paragraphs of section 38.3.1:
> http://developer.postgresql.org/pgdocs/postgres/plpgsql-declarations.html#PLPGSQL-DECLARATION-ALIASES
> as well as the following 38.3.2. How would you do those things with
> a polymorphic TABLE column?
>
> 3. Not treating the parameters as assignable variables makes RETURN NEXT
> nearly worthless in a TABLE function. Since they're not assignable,
> you can't use the parameterless form of RETURN NEXT (which'd return
> the current values of the variables). The only alternative available
> is to return a record or row variable; but there's no convenient way
> to declare such a variable, since after all the whole point here is
> that the function's output rowtype is anonymous.
>
> 4. It's a whole lot easier to explain things if we can just say that
> OUT parameters and TABLE parameters work alike. This is especially
> true when they actually *are* alike for all the other available PLs.
>
> If we insist on the current definition then we are eventually going to
> need to kluge up some solutions to #2 and #3, which seems like make-work
> to me when we already have smooth solutions to these problems for
> OUT parameters.
>
> Comments?
>
> For the archives, here is the patch as I currently have it (with the
> no-plpgsql-variables behavior). But unless I hear a good argument
> to the contrary, I'm going to change that part before committing.
>
> regards, tom lane
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Load spikes on 8.1.11

On Fri, Jul 18, 2008 at 10:21 AM, Andrew Sullivan <ajs@commandprompt.com> wrote:
On Fri, Jul 18, 2008 at 10:05:33AM +0530, Gurjeet Singh wrote:

> I just ran DROP SCHEMA _<slony schema> CASCADE; and it spiked again, on a
> very low loaded box!!

Ah, well, if slony is involved, then you have possible locking
problems in the database _also_ to contend with, along with the
spinlock problems.  This will for sure cause spikes.

You need to tell us more about what you're doing.  And I bet some of
it belongs on the slony lists.


I am in the eye of the storm right now.

Just started INIT cluster Slonik command and that spiked too.. for more than 10 minutes now!!


--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Re: [NOVICE] tracking down idle transactions in pg_locks

Thanks for your help, I was getting rather confident this app was to
blame, it just didn't make sense to me that it would show up on an
unrelated table. This gives me more information to contact the developer
with.

W are running 8.3, I have been able to reproduce the problem, it
occurs after an insert has occurred.


Tom Lane wrote:
> David Gardner <david@gardnerit.net> writes:
>
>> I've been tracking down a problem with idle transactions being left open
>> and not being able to vacuum one of my tables. I think I have tracked it
>> down to proprietary app, the problem seems to be that it leaves "idle
>> transactions" open.
>>
>
>
>> However the table that we are having problems vacuuming (probably just
>> because it is the most active table) is in the same db, but a different
>> scheme not accessed by that app. Could this app still be causing the
>> problem?
>>
>
> Different schema won't help --- VACUUM assumes that *any* other
> transaction in the same database might potentially be able to look at
> the target table later, so it can't remove rows that that transaction
> might be able to "see".
>
> The most common cause of this problem is apps that issue "COMMIT; BEGIN"
> and then go to sleep. If that's what this app is doing (if you don't
> know, turn on statement logging and find out...) then updating to a more
> recent PG version might help. In 8.3 a transaction doesn't block VACUUM's
> row reclamation until it's done something more than just BEGIN.
>
> regards, tom lane
>


--
David Gardner


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: [HACKERS] [PATCH]-hash index improving

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Xiao Meng escribió:
>> You can undefine the macro HASHVALUE_ONLY in hash.h to get the
>> original implementation.

> I think having the HASHVALUE_ONLY define is not a good idea -- it just
> makes the patch harder to read.

While we are griping about readability: failure to update the comments
to match the code is NOT, NOT, NOT acceptable. I had barely started
to read the patch before encountering this insult to the reader:

/* Hash indexes are never lossy (at the moment anyway) */
- scan->xs_recheck = false;
+#ifdef HASHVALUE_ONLY
+ scan->xs_recheck = true;
+#else
+ scan->xs_recheck = false;
+

Re: [HACKERS] Load spikes on 8.1.11

On Fri, Jul 18, 2008 at 10:05:33AM +0530, Gurjeet Singh wrote:

> I just ran DROP SCHEMA _<slony schema> CASCADE; and it spiked again, on a
> very low loaded box!!

Ah, well, if slony is involved, then you have possible locking
problems in the database _also_ to contend with, along with the
spinlock problems. This will for sure cause spikes.

You need to tell us more about what you're doing. And I bet some of
it belongs on the slony lists.

A
--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Load spikes on 8.1.11

"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
> Can you please elaborate on what high level diagnosis would you need?

Well, we'd need some idea of which spinlock is being contended for...

> I just ran DROP SCHEMA _<slony schema> CASCADE; and it spiked again, on a
> very low loaded box!!

That *might* mean that the problem is contention for SInvalLock, since
dropping a large schema would result in a lot of sinval traffic. Or
maybe it's something else. Do your spikes correspond to large DDL
changes?

If your platform has oprofile or DTrace or some such then getting an
execution profile with that type of tool would tell something.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Load spikes on 8.1.11

On Fri, Jul 18, 2008 at 10:05 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
On Fri, Jul 18, 2008 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
>     During these spikes, in the 'top' sessions we see the 'idle' PG
> processes consuming between 2 and 5 % CPU, and since the box has 8 CPUS (2
> sockets and each CPU is a quad core Intel Xeon processors) and somewhere
> around 200 Postgres processes, the load spikes to above 200; and it does
> this very sharply.

This looks like heavy contention for a spinlock.  You need to get a
higher-level analysis of what's happening before anyone can say much
more than that.

Note that 8.1 is pretty much ancient history as far as scalability to
8-core hardware goes.  You should probably consider updating to 8.3
before investing too much time in tracking down what's happening.
If you can still show the problem on 8.3 then there would be some
interest in fixing it ...

Upgrading is on the cards, but not as high priority as I would like it to be! This is a production box, and we desperatly need some respite from these spikes.

Can you please elaborate on what high level diagnosis would you need?

I just ran DROP SCHEMA _<slony schema> CASCADE; and it spiked again, on a very low loaded box!!

Thanks for all you help.

Would reducing the number of connections on the DB help in reducing the spike?



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Re: [PERFORM] index scan cost

Jeff Frost <jeff@frostconsultingllc.com> writes:
> I have two postgresql servers. One runs 8.3.1, the other 8.3.3. On the 8.3.1
> machine, the index scans are being planned extremely low cost:

> Index Scan using ix_email_entity_thread on email_entity (cost=0.00..4.59
> rows=1 width=1031) (actual time=0.095..0.120 rows=4 loops=1)
> Index Cond: (email_thread = 375629157)

> Index Scan using ix_email_entity_thread on email_entity (cost=0.00..2218.61
> rows=1151 width=931) (actual time=0.094..0.111 rows=4 loops=1)
> Index Cond: (email_thread = 375629157)

This isn't a "cost" problem, this is a "stats" problem. Why does the
second server think 1151 rows will be returned? Try comparing the
pg_stats entries for the email_thread column on both servers ... seems
like they must be significantly different.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] Load spikes on 8.1.11

On Fri, Jul 18, 2008 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
>     During these spikes, in the 'top' sessions we see the 'idle' PG
> processes consuming between 2 and 5 % CPU, and since the box has 8 CPUS (2
> sockets and each CPU is a quad core Intel Xeon processors) and somewhere
> around 200 Postgres processes, the load spikes to above 200; and it does
> this very sharply.

This looks like heavy contention for a spinlock.  You need to get a
higher-level analysis of what's happening before anyone can say much
more than that.

Note that 8.1 is pretty much ancient history as far as scalability to
8-core hardware goes.  You should probably consider updating to 8.3
before investing too much time in tracking down what's happening.
If you can still show the problem on 8.3 then there would be some
interest in fixing it ...

Upgrading is on the cards, but not as high priority as I would like it to be! This is a production box, and we desperatly need some respite from these spikes.

Can you please elaborate on what high level diagnosis would you need?

I just ran DROP SCHEMA _<slony schema> CASCADE; and it spiked again, on a very low loaded box!!

Thanks for all you help.



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Re: [HACKERS] Load spikes on 8.1.11

"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
> During these spikes, in the 'top' sessions we see the 'idle' PG
> processes consuming between 2 and 5 % CPU, and since the box has 8 CPUS (2
> sockets and each CPU is a quad core Intel Xeon processors) and somewhere
> around 200 Postgres processes, the load spikes to above 200; and it does
> this very sharply.

This looks like heavy contention for a spinlock. You need to get a
higher-level analysis of what's happening before anyone can say much
more than that.

Note that 8.1 is pretty much ancient history as far as scalability to
8-core hardware goes. You should probably consider updating to 8.3
before investing too much time in tracking down what's happening.
If you can still show the problem on 8.3 then there would be some
interest in fixing it ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[pgus-board] Itinerary

Hey, y'all; if everything goes well, I should arrive in Portland at
10:14 AM your (local) time on Friday (today [here]). (Frontier 791D,
if it matters ;) ). That should give me a fair amount of time to sign
forms, write PgUS flyer inserts, print/copy same at Kinkos, etc...

Looking forward to seeing everyone! :)

---Michael Brewer
mbrewer@gmail.com

(I really should get some sleep, considering I have to be up in 3 hours.)

--
Sent via pgus-board mailing list (pgus-board@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgus-board

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add column storage type to psql \d+ display.

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Bruce Momjian wrote:
>> I don't see that warning with my compiler so I have no way of testing
>> this, but I do see this line pretty high in the function:
>> tableinfo.relkind = *(PQgetvalue(res, 0, 1));

> But it's before the first "goto error_return", after which it is checked.

Ah. That code is indeed broken, or at least risky in the extreme. What
the cleanup code ought to be checking is just whether or not the arrays
have gotten allocated yet. Checking a condition that should later lead
to the array getting allocated is just asking for trouble --- even if
there's not a "goto error_return" in between today, someday someone will
insert one.

Patch applied ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[COMMITTERS] pgsql: Suppress compiler warning, and not incidentally make the code

Log Message:
-----------
Suppress compiler warning, and not incidentally make the code more
robust. The previous coding was quite risky because it was testing
conditions different from 'is the array really allocated?'.

Modified Files:
--------------
pgsql/src/bin/psql:
describe.c (r1.183 -> r1.184)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/bin/psql/describe.c?r1=1.183&r2=1.184)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Re: [GENERAL] Reducing memory usage of insert into select operations?

Douglas McNaught writes:


> It does seem that reducing work_mem might help you, but others on this

I reduced it from 256MB to 64MB. It seems it is helping.
At 256MB the usage per DB connection instance was upwards of 12GB. At 64MB
so far is around 7GB. I just reduced it further to 32MB and see how that
works.


> It would probably be helpful for you to post the EXPLAIN output from
Hash Join (cost=712213.57..27293913.33 rows=234402352 width=24)
Hash Cond: (coh.id = ids.id)
-> Hash Join (cost=551387.26..18799378.16 rows=234402352 width=22)
Hash Cond: (coh.user_id = ca.user_id)
-> Seq Scan on customer_original_historical coh
(cost=0.00..6702501.40 rows=234402352 width=47)
Filter: (yearmo > '200703'::bpchar)
-> Hash (cost=268355.67..268355.67 rows=14637567 width=32)
-> Seq Scan on cards ca
(cost=0.00..268355.67 rows=14637567 width=32)
-> Hash (cost=77883.25..77883.25 rows=5055525 width=6)
-> Seq Scan on customer_ids ids
(cost=0.00..77883.25 rows=5055525 width=6)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Load spikes on 8.1.11

Just an addition... the strace o/p with selects timing out just runs almost continuously, it doesn't seem to pause anywhere!

On Fri, Jul 18, 2008 at 9:16 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
Hi All,

    I have been perplexed by random load spikes on an 8.1.11 instance. many a times they are random, in the sense we cannot tie a particular scenario as the cause for it! But a few times we can see that when we are executing huge scripts, which include DDL as well as DML, the load on the box spikes to above 200. We see similar load spikes other times too when we are not running any such task on the DB.

    During these spikes, in the 'top' sessions we see the 'idle' PG processes consuming between 2 and 5 % CPU, and since the box has 8 CPUS (
2 sockets and each CPU is a quad core Intel Xeon processors) and somewhere around 200 Postgres processes, the load spikes to above 200; and it does this very sharply.

    We are running the scripts using psql -f, but we can see the load even while running the commands on by one!

    When there's no load, an strace session on an 'idle' PG process looks like:

[postgres@db1 data]$ strace -p 9375
Process 9375 attached - interrupt to quit
recvfrom(9,  <unfinished ...>
Process 9375 detached


    But under these heavy load onditions, an 'idle' PG process' strace looks like:

[postgres@db1 data]$ strace -p 22994
Process 22994 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 11000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 14000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 17000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 31000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 51000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 5000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 12000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 12000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 23000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 27000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 47000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 70000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 11000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 19000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 35000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 53000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 75000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 76000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 102000}) = 0 (Timeout)
Process 22994 detached


    So I guess there's something very wrong with the above 'select' calls.

    Can somebody please shed some light on this? Let me know what OS/hardware specs you need.

    Any help is greatly appreciated.

Thanks in advance,


--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device



--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

[HACKERS] Load spikes on 8.1.11

Hi All,

    I have been perplexed by random load spikes on an 8.1.11 instance. many a times they are random, in the sense we cannot tie a particular scenario as the cause for it! But a few times we can see that when we are executing huge scripts, which include DDL as well as DML, the load on the box spikes to above 200. We see similar load spikes other times too when we are not running any such task on the DB.

    During these spikes, in the 'top' sessions we see the 'idle' PG processes consuming between 2 and 5 % CPU, and since the box has 8 CPUS (
2 sockets and each CPU is a quad core Intel Xeon processors) and somewhere around 200 Postgres processes, the load spikes to above 200; and it does this very sharply.

    We are running the scripts using psql -f, but we can see the load even while running the commands on by one!

    When there's no load, an strace session on an 'idle' PG process looks like:

[postgres@db1 data]$ strace -p 9375
Process 9375 attached - interrupt to quit
recvfrom(9,  <unfinished ...>
Process 9375 detached


    But under these heavy load onditions, an 'idle' PG process' strace looks like:

[postgres@db1 data]$ strace -p 22994
Process 22994 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 11000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 14000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 17000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 31000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 51000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 5000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 3000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 6000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 12000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 12000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 23000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 27000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 47000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 70000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 2000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 4000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 7000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 11000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 19000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 35000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 53000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 75000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 76000}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 102000}) = 0 (Timeout)
Process 22994 detached


    So I guess there's something very wrong with the above 'select' calls.

    Can somebody please shed some light on this? Let me know what OS/hardware specs you need.

    Any help is greatly appreciated.

Thanks in advance,


--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Re: [GENERAL] Reducing memory usage of insert into select operations?

On Thu, Jul 17, 2008 at 9:27 PM, Francisco Reyes <lists@stringsutils.com> wrote:
> Douglas McNaught writes:
>
>> Is this a 32-bit installation or 64-bit? 3GB of shared_buffers is way
>> too big for a 32-bit setup.
>
>
> 64-bit.
> The machine has 12GB of RAM so shared-buffers is about 1/3.
> Dedicated DB server.

Ahh, good. Just wanted to answer the obvious question first. Some
people set shared_buffers really high on 32-bit systems and then are
surprised when it doesn't work well.

It does seem that reducing work_mem might help you, but others on this
list are much more expert than I in diagnosing this sort of problem.
It would probably be helpful for you to post the EXPLAIN output from
your query, so they can see which part of the plan causes the large
memory usage.

-Doug

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Re: [COMMITTERS] pgsql: Add column storage type to psql \d+ display.

Bruce Momjian wrote:
> Alvaro Herrera wrote:

> > I think this patch has caused this new warning in psql:
> >
> > /pgsql//source/00head/src/bin/psql/describe.c: In function ?describeOneTableDetails?:
> > /pgsql//source/00head/src/bin/psql/describe.c:832: warning: ?tableinfo.relkind? may be used uninitialized in this function
>
> I don't see that warning with my compiler so I have no way of testing
> this, but I do see this line pretty high in the function:
>
> tableinfo.relkind = *(PQgetvalue(res, 0, 1));

But it's before the first "goto error_return", after which it is checked.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [COMMITTERS] pgsql: Support "variadic" functions, which can accept a variable number

Simon Riggs <simon@2ndquadrant.com> writes:
> I think CREATE FUNCTION should specifically mention that VARIADIC can
> only be used as the arg mode of the last parameter. Other pages mention
> this, but the syntax doesn't, which could confuse people. Like me, who
> started with the CREATE FUNCTION page, rather than other parts of the
> docs.

Done in connection with the TABLE-function docs update.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[COMMITTERS] pgsql: Implement SQL-spec RETURNS TABLE syntax for functions.

Log Message:
-----------
Implement SQL-spec RETURNS TABLE syntax for functions.

(Unlike the original submission, this patch treats TABLE output parameters
as being entirely equivalent to OUT parameters -- tgl)

Pavel Stehule

Modified Files:
--------------
pgsql/doc/src/sgml:
catalogs.sgml (r2.170 -> r2.171)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/catalogs.sgml?r1=2.170&r2=2.171)
func.sgml (r1.441 -> r1.442)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/func.sgml?r1=1.441&r2=1.442)
plpgsql.sgml (r1.132 -> r1.133)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/plpgsql.sgml?r1=1.132&r2=1.133)
xfunc.sgml (r1.131 -> r1.132)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/xfunc.sgml?r1=1.131&r2=1.132)
pgsql/doc/src/sgml/ref:
create_function.sgml (r1.79 -> r1.80)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/create_function.sgml?r1=1.79&r2=1.80)
pgsql/src/backend/catalog:
information_schema.sql (r1.44 -> r1.45)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/information_schema.sql?r1=1.44&r2=1.45)
pg_proc.c (r1.152 -> r1.153)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/pg_proc.c?r1=1.152&r2=1.153)
pgsql/src/backend/commands:
functioncmds.c (r1.97 -> r1.98)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/functioncmds.c?r1=1.97&r2=1.98)
pgsql/src/backend/parser:
gram.y (r2.617 -> r2.618)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/parser/gram.y?r1=2.617&r2=2.618)
pgsql/src/backend/utils/adt:
ruleutils.c (r1.277 -> r1.278)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/ruleutils.c?r1=1.277&r2=1.278)
pgsql/src/backend/utils/fmgr:
funcapi.c (r1.40 -> r1.41)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/fmgr/funcapi.c?r1=1.40&r2=1.41)
pgsql/src/bin/pg_dump:
pg_dump.c (r1.495 -> r1.496)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/bin/pg_dump/pg_dump.c?r1=1.495&r2=1.496)
pgsql/src/bin/psql:
describe.c (r1.182 -> r1.183)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/bin/psql/describe.c?r1=1.182&r2=1.183)
pgsql/src/include/catalog:
catversion.h (r1.470 -> r1.471)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/catversion.h?r1=1.470&r2=1.471)
pg_proc.h (r1.508 -> r1.509)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/catalog/pg_proc.h?r1=1.508&r2=1.509)
pgsql/src/include/nodes:
parsenodes.h (r1.367 -> r1.368)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/nodes/parsenodes.h?r1=1.367&r2=1.368)
pgsql/src/include/utils:
builtins.h (r1.318 -> r1.319)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/utils/builtins.h?r1=1.318&r2=1.319)
pgsql/src/pl/plpgsql/src:
pl_comp.c (r1.127 -> r1.128)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpgsql/src/pl_comp.c?r1=1.127&r2=1.128)
pgsql/src/pl/plpython:
plpython.c (r1.111 -> r1.112)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/pl/plpython/plpython.c?r1=1.111&r2=1.112)
pgsql/src/test/regress/expected:
plpgsql.out (r1.65 -> r1.66)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/plpgsql.out?r1=1.65&r2=1.66)
rangefuncs.out (r1.17 -> r1.18)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/rangefuncs.out?r1=1.17&r2=1.18)
pgsql/src/test/regress/sql:
plpgsql.sql (r1.55 -> r1.56)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/plpgsql.sql?r1=1.55&r2=1.56)
rangefuncs.sql (r1.7 -> r1.8)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/rangefuncs.sql?r1=1.7&r2=1.8)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Re: [GENERAL] Reducing memory usage of insert into select operations?

Douglas McNaught writes:

> Is this a 32-bit installation or 64-bit? 3GB of shared_buffers is way
> too big for a 32-bit setup.


64-bit.
The machine has 12GB of RAM so shared-buffers is about 1/3.
Dedicated DB server.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [NOVICE] tracking down idle transactions in pg_locks

David Gardner <david@gardnerit.net> writes:
> I've been tracking down a problem with idle transactions being left open
> and not being able to vacuum one of my tables. I think I have tracked it
> down to proprietary app, the problem seems to be that it leaves "idle
> transactions" open.

> However the table that we are having problems vacuuming (probably just
> because it is the most active table) is in the same db, but a different
> scheme not accessed by that app. Could this app still be causing the
> problem?

Different schema won't help --- VACUUM assumes that *any* other
transaction in the same database might potentially be able to look at
the target table later, so it can't remove rows that that transaction
might be able to "see".

The most common cause of this problem is apps that issue "COMMIT; BEGIN"
and then go to sleep. If that's what this app is doing (if you don't
know, turn on statement logging and find out...) then updating to a more
recent PG version might help. In 8.3 a transaction doesn't block VACUUM's
row reclamation until it's done something more than just BEGIN.

regards, tom lane

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: [GENERAL] Reducing memory usage of insert into select operations?

On Thu, Jul 17, 2008 at 7:21 PM, Francisco Reyes <lists@stringsutils.com> wrote:
> Redhat 4
> postgresql 8.3.3
> Memory: 12GB
>
> While doing a couple of operations of the type
> insert into <table> select from <some other table>
>
> The OS triggered the out of memory killer (oom-killer).

Is this a 32-bit installation or 64-bit? 3GB of shared_buffers is way
too big for a 32-bit setup.

-Doug

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] TODO list and "hyphen"

Scara Maccai wrote:
> Hi,
>
> I was looking at the TODO:
>
> http://www.postgresql.org/docs/faqs.TODO.html
>
>
> "A hyphen, "-", marks changes that will appear in the upcoming 8.4 release."
>
> Well, making a search for the "-" sign is complicated... it's obviously
> used for a lot of other things... could you use another character?

I think a search for ' -' will show you the items you want.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] TODO list and "hyphen"

salman wrote:
> >> -Allow administrators to safely terminate individual sessions either
> via an SQL function or SIGTERM <<
>
> Can't you already do this with pg_cancel_backend() or kill <child pid>?

Cancel backend only cancels the current query but later queries are
still executed. Kill was never officially supported as a way to
terminate a backend, but it will be in 8.4 and there is an SQL function
to do it too.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[HACKERS] Re: [COMMITTERS] pgsql: Add column storage type to psql \d+ display.

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > Log Message:
> > -----------
> > Add column storage type to psql \d+ display.
>
> I think this patch has caused this new warning in psql:
>
> /pgsql//source/00head/src/bin/psql/describe.c: In function ?describeOneTableDetails?:
> /pgsql//source/00head/src/bin/psql/describe.c:832: warning: ?tableinfo.relkind? may be used uninitialized in this function

I don't see that warning with my compiler so I have no way of testing
this, but I do see this line pretty high in the function:

tableinfo.relkind = *(PQgetvalue(res, 0, 1));

Do you have any idea how relkind could be accessed before this
assignment? The line number is it complaining about is the definition
of relkind in the structure, not any reference to the variable.

The only crazy idea I have is that the variable name appears in a string
above the assignment:

"SELECT relhasindex, relkind, relchecks, reltriggers, relhasrules, "

And relkind is the only column that matches a structure member.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pdxpug] PDX PostgreSQL Meeting

On Wed, Jul 16, 2008 at 5:42 PM, Tim Bruce <timsnews@tbruce.com> wrote:
> After talking (ahhm, e-mailing) with Selena, we won't be having our normal
> meeting tomorow (Thursday, July 17th). The Sunday get-together (listed on
> the pdxpug web site - http://pugs.postgresql.org/pdx) will replace it.
>
> So don't go to Free Geek tomorrow looking for us!

D'oh! I didn't see this until now. Too much email yesterday.

--
Sent via pdxpug mailing list (pdxpug@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pdxpug

Re: [COMMITTERS] pgsql: Allow TRUNCATE foo, foo to succeed, per report from Nikhils.

Simon Riggs wrote:
> > Huh? There was a bug report, with suggested fix on June 5th from
> > Nikhil - http://archives.postgresql.org/pgsql-hackers/2008-06/msg00231.php
> >
> > No-one responded, and over a month later Bruce fixed the bug, pointing
> > out that TRUNCATE is now consistent with LOCK.
> >
> > The fact that Bruce fixed it during a commit fest is irrelevant. Bug
> > fixes don't stop for commit fests
>
> OK, my mistake. Comments withdrawn, with apologies.

The ugly truth is that my schedule is now so crazy with travel that I am
having to catch up with community emails that need attention anytime I
have the chance, rather than at appropriate times.

Recent schedule:

-1 week, church mission trip to Pittsburgh
+1 week, OSCON
+3 weeks, Linuxworld
+6 weeks, Brasilia conference

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[COMMITTERS] pgsql: Add URL for: * Implement SQL:2003 window functions >

Log Message:
-----------
Add URL for:

* Implement SQL:2003 window functions

> http://archives.postgresql.org/pgsql-hackers/2008-07/msg00232.php

Modified Files:
--------------
pgsql/doc:
TODO (r1.2491 -> r1.2492)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/TODO?r1=1.2491&r2=1.2492)
pgsql/doc/src/FAQ:
TODO.html (r1.997 -> r1.998)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/doc/src/FAQ/TODO.html?r1=1.997&r2=1.998)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Re: [pgsql-advocacy] PostgreSQL passes MySQL for Freshmeat Downloads

Marc G. Fournier wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Josh was added as Admin a few weeks ago ... which you would know if you had
> taken to actually check the entry :)

Oh, I see it now. Sorry. :-(

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-advocacy mailing list (pgsql-advocacy@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-advocacy

[NOVICE] tracking down idle transactions in pg_locks

I've been tracking down a problem with idle transactions being left open
and not being able to vacuum one of my tables. I think I have tracked it
down to proprietary app, the problem seems to be that it leaves "idle
transactions" open.

However the table that we are having problems vacuuming (probably just
because it is the most active table) is in the same db, but a different
scheme not accessed by that app. Could this app still be causing the
problem?

I did a select on the pg_locks table and filtered by the pid of the idle
transaction. There were 8 locks, 7 were on tables that the app uses
however the 8th was of locktype=virtualxid and mode=ExclusiveLock. Could
this be some sort of global lock?

I have been able to verify that closing the app, does allow us to vacuum
the table.

--
David Gardner


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: [PATCHES] WITH RECUSIVE patches 0717

> > Here is the lastest WITH RECURSIVE patches against CVS HEAD created by
> > Yoshiyuki Asaba and minor corrections by Tatsuo Ishii.
>
> I tried this patch vs. CVS HEAD used my usual configure option with
> only --with-prefix set, then tried to make, and got:
>
> make[3]: *** No rule to make target `parse_cte.o', needed by `objfiles.txt'. Stop.
> make[3]: Leaving directory `/home/shackle/pgsql/src/backend/parser'
> make[2]: *** [parser-recursive] Error 2
> make[2]: Leaving directory `/home/shackle/pgsql/src/backend'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/home/shackle/pgsql/src'
> make: *** [all] Error 2
>
> Is there something missing?

Oops. I forgot to include patches against newly added files. Please
try included patches.

> > (David Fetter's psql help patches are not included. It seems his git
> > repository has gone).
>
> I apologize for that. I rearranged it last night because the name was
> not scalable, but delayed sending this out until today. It can now be
> found at
>
> <http://git.postgresql.org/?p=~davidfetter/with_recursive/.git;a=summary>
>
> To pull from the new location, in your .git/config, change URL from something
> like the following:
>
> url = git://davidfetter@git.postgresql.org/git/~davidfetter/postgresql/.git
>
> to
>
> url = git://davidfetter@git.postgresql.org/git/~davidfetter/with_recursive/.git

Let me try later.

> > This version implements:
> >
> > - detect certain queries those are not valid acroding to the standard
>
> Great :)
>
> > I also include erroneous query examples created by Yoshiyuki (probably
> > will become part of regression tests).
> >
> > Remaining problmes are:
> >
> > 1) sort query names acording to the dependency
>
> This can be done at query time already using arrays per Asaba-san's
> suggestion. I'll add some examples to the documentation.

1) refers to mutually recursive queres.

> > 2) planner always estimate 0 cost for recursion plans
> > 3) add regression tests
> >
> > For 1), I have proposed we limit query names to 1, in another word do
> > not allow mutually recursive queries. For 2) there's no good idea to
> > solve it, so I suggest leave it as it is now.
> >
> > For 3) I will generate regression tests as soon as possible.
> >
> > So the patches seem to be almost ready to commit IMO.
>
> Wonderful!
>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
> Skype: davidfetter XMPP: david.fetter@gmail.com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate

[GENERAL] Reducing memory usage of insert into select operations?

Redhat 4
postgresql 8.3.3
Memory: 12GB

While doing a couple of operations of the type
insert into <table> select from <some other table>

The OS triggered the out of memory killer (oom-killer).

After some research and trial/error I found it was the inserts.
I see one of the inserts is using up 12GB!

How can I reduce the usage?
Postgresql.conf settings.
shared_buffers = 3GB
temp_buffers = 64MB # min 800kB
work_mem = 256MB # min 64kB
maintenance_work_mem = 1GB

Reducing work_mem would help?

The table I am selecting from has a few hundred million rows.
The table I am inserting into has partitions. I am benchmarking breaking up
a large table into smaller partitions.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

Index: src/backend/tsearch/wparser_def.c
===================================================================
RCS file: /home/postgres/devel/pgsql-cvs/pgsql/src/backend/tsearch/wparser_def.c,v
retrieving revision 1.15
diff -c -r1.15 wparser_def.c
*** src/backend/tsearch/wparser_def.c 17 Jun 2008 16:09:06 -0000 1.15
--- src/backend/tsearch/wparser_def.c 18 Jul 2008 00:07:22 -0000
***************
*** 1684,1701 ****
return false;
}

! Datum
! prsd_headline(PG_FUNCTION_ARGS)
{
! HeadlineParsedText *prs = (HeadlineParsedText *) PG_GETARG_POINTER(0);
! List *prsoptions = (List *) PG_GETARG_POINTER(1);
! TSQuery query = PG_GETARG_TSQUERY(2);

! /* from opt + start and and tag */
! int min_words = 15;
! int max_words = 35;
! int shortword = 3;

int p = 0,
q = 0;
int bestb = -1,
--- 1684,1941 ----
return false;
}

! static void
! mark_fragment(HeadlineParsedText *prs, int highlight, int startpos, int endpos)
{
! int i;
! char *coversep = "... ";
! int seplen = strlen(coversep);

! for (i = startpos; i <= endpos; i++)
! {
! if (prs->words[i].item)
! prs->words[i].selected = 1;
! if (highlight == 0)
! {
! if (HLIDIGNORE(prs->words[i].type))
! prs->words[i].replace = 1;
! }
! else
! {
! if (XMLHLIDIGNORE(prs->words[i].type))
! prs->words[i].replace = 1;
! }
!
! prs->words[i].in = (prs->words[i].repeated) ? 0 : 1;
! }
! /* add cover separators if needed */
! if (startpos > 0 && startpos <= endpos)
! {
!
! prs->words[startpos-1].word = repalloc(prs->words[startpos-1].word, sizeof(char) * seplen);
! prs->words[startpos-1].in = 1;
! prs->words[startpos-1].len = seplen;
! memcpy(prs->words[startpos-1].word, coversep, seplen);
! }
! }
!
! typedef struct
! {
! int4 startpos;
! int4 endpos;
! int4 poslen;
! int4 curlen;
! int2 in;
! int2 excluded;
! } CoverPos;
!
! static void
! get_next_fragment(HeadlineParsedText *prs, int *startpos, int *endpos,
! int *curlen, int *poslen, int max_words)
! {
! int i;
! /* Objective: Generate a fragment of words between startpos and endpos
! * such that it has at most max_words and both ends has query words.
! * If the startpos and endpos are the endpoints of the cover and the
! * cover has fewer words than max_words, then this function should
! * just return the cover
! */
! /* first move startpos to an item */
! for(i = *startpos; i <= *endpos; i++)
! {
! *startpos = i;
! if (prs->words[i].item && !prs->words[i].repeated)
! break;
! }
! /* cut endpos to have only max_words */
! *curlen = 0;
! *poslen = 0;
! for(i = *startpos; i <= *endpos && *curlen < max_words; i++)
! {
! if (!NONWORDTOKEN(prs->words[i].type))
! *curlen += 1;
! if (prs->words[i].item && !prs->words[i].repeated)
! *poslen += 1;
! }
! /* if the cover was cut then move back endpos to a query item */
! if (*endpos > i)
! {
! *endpos = i;
! for(i = *endpos; i >= *startpos; i --)
! {
! *endpos = i;
! if (prs->words[i].item && !prs->words[i].repeated)
! break;
! if (!NONWORDTOKEN(prs->words[i].type))
! *curlen -= 1;
! }
! }
! }
!
! static void
! mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, int highlight,
! int shortword, int min_words,
! int max_words, int max_fragments)
! {
! int4 poslen, curlen, i, f, num_f = 0;
! int4 stretch, maxstretch, posmarker;
!
! int4 startpos = 0,
! endpos = 0,
! p = 0,
! q = 0;
!
! int4 numcovers = 0,
! maxcovers = 32;
!
! int4 minI, minwords, maxitems;
! CoverPos *covers;
!
! covers = palloc(maxcovers * sizeof(CoverPos));
!
! /* get all covers */
! while (hlCover(prs, query, &p, &q))
! {
! startpos = p;
! endpos = q;
!
! /* Break the cover into smaller fragments such that each fragment
! * has at most max_words. Also ensure that each end of the fragment
! * is a query word. This will allow us to stretch the fragment in
! * either direction
! */
!
! while (startpos <= endpos)
! {
! get_next_fragment(prs, &startpos, &endpos, &curlen, &poslen, max_words);
! if (numcovers >= maxcovers)
! {
! maxcovers *= 2;
! covers = repalloc(covers, sizeof(CoverPos) * maxcovers);
! }
! covers[numcovers].startpos = startpos;
! covers[numcovers].endpos = endpos;
! covers[numcovers].curlen = curlen;
! covers[numcovers].poslen = poslen;
! covers[numcovers].in = 0;
! covers[numcovers].excluded = 0;
! numcovers ++;
! startpos = endpos + 1;
! endpos = q;
! }
! /* move p to generate the next cover */
! p++;
! }

+ /* choose best covers */
+ for (f = 0; f < max_fragments; f++)
+ {
+ maxitems = 0;
+ minwords = 0x7fffffff;
+ minI = -1;
+ /* Choose the cover that contains max items.
+ * In case of tie choose the one with smaller
+ * number of words.
+ */
+ for (i = 0; i < numcovers; i ++)
+ {
+ if (!covers[i].in && !covers[i].excluded &&
+ (maxitems < covers[i].poslen || (maxitems == covers[i].poslen
+ && minwords > covers[i].curlen)))
+ {
+ maxitems = covers[i].poslen;
+ minwords = covers[i].curlen;
+ minI = i;
+ }
+ }
+ /* if a cover was found mark it */
+ if (minI >= 0)
+ {
+ covers[minI].in = 1;
+ /* adjust the size of cover */
+ startpos = covers[minI].startpos;
+ endpos = covers[minI].endpos;
+ curlen = covers[minI].curlen;
+ /* stretch the cover if cover size is lower than max_words */
+ if (curlen < max_words)
+ {
+ /* divide the stretch on both sides of cover */
+ maxstretch = (max_words - curlen)/2;
+ /* first stretch the startpos
+ * stop stretching if
+ * 1. we hit the beginning of document
+ * 2. exceed maxstretch
+ * 3. we hit an already marked fragment
+ */
+ stretch = 0;
+ posmarker = startpos;
+ for (i = startpos - 1; i >= 0 && stretch < maxstretch && !prs->words[i].in; i--)
+ {
+ if (!NONWORDTOKEN(prs->words[i].type))
+ {
+ curlen ++;
+ stretch ++;
+ }
+ posmarker = i;
+ }
+ /* cut back startpos till we find a non short token */
+ for (i = posmarker; i < startpos && (NOENDTOKEN(prs->words[i].type) || prs->words[i].len <= shortword); i++)
+ {
+ if (!NONWORDTOKEN(prs->words[i].type))
+ curlen --;
+ }
+ startpos = i;
+ /* now stretch the endpos as much as possible*/
+ posmarker = endpos;
+ for (i = endpos + 1; i < prs->curwords && curlen < max_words && !prs->words[i].in; i++)
+ {
+ if (!NONWORDTOKEN(prs->words[i].type))
+ curlen ++;
+ posmarker = i;
+ }
+ /* cut back endpos till we find a non-short token */
+ for ( i = posmarker; i > endpos && (NOENDTOKEN(prs->words[i].type) || prs->words[i].len <= shortword); i--)
+ {
+ if (!NONWORDTOKEN(prs->words[i].type))
+ curlen --;
+ }
+ endpos = i;
+ }
+ covers[minI].startpos = startpos;
+ covers[minI].endpos = endpos;
+ covers[minI].curlen = curlen;
+ /* Mark the chosen fragments (covers) */
+ mark_fragment(prs, highlight, startpos, endpos);
+ num_f ++;
+ /* exclude overlapping covers */
+ for (i = 0; i < numcovers; i ++)
+ {
+ if (i != minI && ( (covers[i].startpos >= covers[minI].startpos && covers[i].startpos <= covers[minI].endpos) || (covers[i].endpos >= covers[minI].startpos && covers[i].endpos <= covers[minI].endpos)))
+ covers[i].excluded = 1;
+ }
+ }
+ else
+ break;
+ }
+
+ /* show at least min_words we have not marked anything*/
+ if (num_f <= 0)
+ {
+ startpos = endpos = curlen = 0;
+ for (i = 0; i < prs->curwords && curlen < min_words; i++)
+ {
+ if (!NONWORDTOKEN(prs->words[i].type))
+ curlen++;
+ endpos = i;
+ }
+ mark_fragment(prs, highlight, startpos, endpos);
+ }
+ pfree(covers);
+ }
+ static void
+ mark_hl_words(HeadlineParsedText *prs, TSQuery query, int highlight,
+ int shortword, int min_words, int max_words)
+ {
int p = 0,
q = 0;
int bestb = -1,
***************
*** 1707,1762 ****
curlen;

int i;
- int highlight = 0;
- ListCell *l;
-
- /* config */
- prs->startsel = NULL;
- prs->stopsel = NULL;
- foreach(l, prsoptions)
- {
- DefElem *defel = (DefElem *) lfirst(l);
- char *val = defGetString(defel);
-
- if (pg_strcasecmp(defel->defname, "MaxWords") == 0)
- max_words = pg_atoi(val, sizeof(int32), 0);
- else if (pg_strcasecmp(defel->defname, "MinWords") == 0)
- min_words = pg_atoi(val, sizeof(int32), 0);
- else if (pg_strcasecmp(defel->defname, "ShortWord") == 0)
- shortword = pg_atoi(val, sizeof(int32), 0);
- else if (pg_strcasecmp(defel->defname, "StartSel") == 0)
- prs->startsel = pstrdup(val);
- else if (pg_strcasecmp(defel->defname, "StopSel") == 0)
- prs->stopsel = pstrdup(val);
- else if (pg_strcasecmp(defel->defname, "HighlightAll") == 0)
- highlight = (pg_strcasecmp(val, "1") == 0 ||
- pg_strcasecmp(val, "on") == 0 ||
- pg_strcasecmp(val, "true") == 0 ||
- pg_strcasecmp(val, "t") == 0 ||
- pg_strcasecmp(val, "y") == 0 ||
- pg_strcasecmp(val, "yes") == 0);
- else
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized headline parameter: \"%s\"",
- defel->defname)));
- }

if (highlight == 0)
{
- if (min_words >= max_words)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("MinWords should be less than MaxWords")));
- if (min_words <= 0)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("MinWords should be positive")));
- if (shortword < 0)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("ShortWord should be >= 0")));
-
while (hlCover(prs, query, &p, &q))
{
/* find cover len in words */
--- 1947,1955 ----
***************
*** 1877,1882 ****
--- 2070,2152 ----
prs->words[i].in = (prs->words[i].repeated) ? 0 : 1;
}

+ }
+
+ Datum
+ prsd_headline(PG_FUNCTION_ARGS)
+ {
+ HeadlineParsedText *prs = (HeadlineParsedText *) PG_GETARG_POINTER(0);
+ List *prsoptions = (List *) PG_GETARG_POINTER(1);
+ TSQuery query = PG_GETARG_TSQUERY(2);
+
+ /* from opt + start and and tag */
+ int min_words = 15;
+ int max_words = 35;
+ int shortword = 3;
+ int max_fragments = 0;
+ int highlight = 0;
+ ListCell *l;
+
+ /* config */
+ prs->startsel = NULL;
+ prs->stopsel = NULL;
+ foreach(l, prsoptions)
+ {
+ DefElem *defel = (DefElem *) lfirst(l);
+ char *val = defGetString(defel);
+
+ if (pg_strcasecmp(defel->defname, "MaxWords") == 0)
+ max_words = pg_atoi(val, sizeof(int32), 0);
+ else if (pg_strcasecmp(defel->defname, "MinWords") == 0)
+ min_words = pg_atoi(val, sizeof(int32), 0);
+ else if (pg_strcasecmp(defel->defname, "ShortWord") == 0)
+ shortword = pg_atoi(val, sizeof(int32), 0);
+ else if (pg_strcasecmp(defel->defname, "MaxFragments") == 0)
+ max_fragments = pg_atoi(val, sizeof(int32), 0);
+ else if (pg_strcasecmp(defel->defname, "StartSel") == 0)
+ prs->startsel = pstrdup(val);
+ else if (pg_strcasecmp(defel->defname, "StopSel") == 0)
+ prs->stopsel = pstrdup(val);
+ else if (pg_strcasecmp(defel->defname, "HighlightAll") == 0)
+ highlight = (pg_strcasecmp(val, "1") == 0 ||
+ pg_strcasecmp(val, "on") == 0 ||
+ pg_strcasecmp(val, "true") == 0 ||
+ pg_strcasecmp(val, "t") == 0 ||
+ pg_strcasecmp(val, "y") == 0 ||
+ pg_strcasecmp(val, "yes") == 0);
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized headline parameter: \"%s\"",
+ defel->defname)));
+ }
+
+ if (highlight == 0)
+ {
+ if (min_words >= max_words)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("MinWords should be less than MaxWords")));
+ if (min_words <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("MinWords should be positive")));
+ if (shortword < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ShortWord should be >= 0")));
+ if (max_fragments < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("MaxFragments should be >= 0")));
+ }
+
+ if (max_fragments == 0)
+ /* call the default headline generator */
+ mark_hl_words(prs, query, highlight, shortword, min_words, max_words);
+ else
+ mark_hl_fragments(prs, query, highlight, shortword, min_words, max_words, max_fragments);
+
if (!prs->startsel)
prs->startsel = pstrdup("<b>");
if (!prs->stopsel)
***************
*** 1886,1888 ****
--- 2156,2159 ----

PG_RETURN_POINTER(prs);
}
+
-- does it work

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy
query and optionally return them in some order. Most common case: Find
documents containing all query terms and return them in order of their
similarity to the query.', to_tsquery('english', 'documents'),
'MaxFragments=1');

ts_headline
----------------------------------------------------------------------
... purpose of FTS is to find <b>documents</b>, which satisfy query and
optionally return them in some order. Most common case: Find <b>documents</b>
containing all query terms and return them in order of their similarity
(1 row)

-- does it respect MinWords

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'nosuchterm'), 'MaxFragments=1,MinWords=5');
ts_headline
-----------------------
The purpose of FTS is


-- does it respect MaxWords

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'document'), 'MaxFragments=1,MinWords=5,MaxWords=8');
ts_headline
----------------------------------------------------------------------
... find <b>documents</b>, which satisfy query and optionally return

-- does it exclude ShortWord in the end ( "in" is excluded)

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'document'), 'MaxFragments=1,MinWords=5,MaxWords=10');
ts_headline
---------------------------------------------------------------------------
... find <b>documents</b>, which satisfy query and optionally return them


-- does it exclude ShortWord in the front ( "The" is excluded)

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'document'), 'MaxFragments=1,MinWords=5,MaxWords=13');
ts_headline
-------------------------------------------------------------------------------------------
... purpose of FTS is to find <b>documents</b>, which satisfy query and optionally return
(1 row)


-- when multiple words are used, the cover is shown in middle of the fragment (cover size <= MaxWords)

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'optional & order'), 'MaxFragments=1,MinWords=5,MaxWords=10');
ts_headline
-------------------------------------------------------------------------------
... query and <b>optionally</b> return them in some <b>order</b>. Most common


-- does it choose the smallest cover (there are three covers between positions (7,17), (17, 22), and (22, 31). The chosen one is (17, 22))

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'order & documents'), 'MaxFragments=1,MinWords=5,MaxWords=10');
ts_headline
-------------------------------------------------------------------------------------
... some <b>order</b>. Most common case: Find <b>documents</b> containing all query
(1 row)

-- does it show multiple fragments

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'query & documents'), 'MaxFragments=2,MinWords=5,MaxWords=10');

ts_headline
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
... find <b>documents</b>, which satisfy <b>query</b> and optionally return them... common case: Find <b>documents</b> containing all <b>query</b> terms and return
(1 row)

-- does it exclude overlapping covers (even when MaxFragments = 2, the overlapping covers are excluded)

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'query & order & documents'), 'MaxFragments=2,MinWords=5,MaxWords=15');
ts_headline
---------------------------------------------------------------------------------------------------------------------
... them in some <b>order</b>. Most common case: Find <b>documents</b> containing all <b>query</b> terms and return
(1 row)


-- when cover size is greater than MaxWords, does it break covers into fragments (first with MaxFragments = 1 and then with maxFragments = 2)

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'purpose & similarity'), 'MaxFragments=1,MinWords=5,MaxWords=10');
ts_headline
---------------------------------------------------------------------
... <b>purpose</b> of FTS is to find documents, which satisfy query
(1 row)

testdb=# select ts_headline('The purpose of FTS is to find documents, which satisfy query and optionally return them in some order. Most common case: Find documents containing all query terms and return them in order of their similarity to the query.', to_tsquery('english', 'purpose & similarity'), 'MaxFragments=2,MinWords=5,MaxWords=10');
ts_headline
----------------------------------------------------------------------------------------------------------------------
... <b>purpose</b> of FTS is to find documents, which satisfy query... order of their <b>similarity</b> to the query
(1 row)

-- using Oleg suggestions for testing some boundry cases

testdb=# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery, 'MaxFragments=1');
ts_headline
-----------------------
... <b>3</b> <b>1</b>
(1 row)

testdb=# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery, 'MaxFragments=2');
ts_headline
------------------------------------------
<b>1</b> 2 <b>3</b>... <b>3</b> <b>1</b>

testdb=# select ts_headline('1 2 3 4 5 1 2 3 1','1&2'::tsquery, 'MaxFragments=2');
ts_headline
----------------------------------------
<b>1</b> <b>2</b>... <b>1</b> <b>2</b>

testdb=# select ts_headline('1 2 3 4 5 1 2 3 1','2'::tsquery, 'MaxFragments=2');
ts_headline
--------------------------
... <b>2</b>... <b>2</b>

testdb=# select ts_headline('1 2 3 4 5 1 2 3 1','1&4'::tsquery, 'MaxFragments=2');
ts_headline
-------------------------
... <b>4</b> 5 <b>1</b>
(1 row)

Index: src/test/regress/sql/tsearch.sql
===================================================================
RCS file: /home/postgres/devel/pgsql-cvs/pgsql/src/test/regress/sql/tsearch.sql,v
retrieving revision 1.9
diff -c -r1.9 tsearch.sql
*** src/test/regress/sql/tsearch.sql 16 May 2008 16:31:02 -0000 1.9
--- src/test/regress/sql/tsearch.sql 18 Jul 2008 00:57:14 -0000
***************
*** 208,213 ****
--- 208,253 ----
</html>',
to_tsquery('english', 'sea&foo'), 'HighlightAll=true');

+ --Check if headline fragments work
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+ We stuck, nor breath nor motion,
+ As idle as a painted Ship
+ Upon a painted Ocean.
+ Water, water, every where
+ And all the boards did shrink;
+ Water, water, every where,
+ Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'ocean'), 'MaxFragments=1');
+
+ --Check if more than one fragments are displayed
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+ We stuck, nor breath nor motion,
+ As idle as a painted Ship
+ Upon a painted Ocean.
+ Water, water, every where
+ And all the boards did shrink;
+ Water, water, every where,
+ Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'Coleridge & stuck'), 'MaxFragments=2');
+
+ --Fragments when there all query words are not in the document
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+ We stuck, nor breath nor motion,
+ As idle as a painted Ship
+ Upon a painted Ocean.
+ Water, water, every where
+ And all the boards did shrink;
+ Water, water, every where,
+ Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'ocean & seahorse'), 'MaxFragments=1');
+
+
--Rewrite sub system

CREATE TABLE test_tsquery (txtkeyword TEXT, txtsample TEXT);
Index: src/test/regress/expected/tsearch.out
===================================================================
RCS file: /home/postgres/devel/pgsql-cvs/pgsql/src/test/regress/expected/tsearch.out,v
retrieving revision 1.14
diff -c -r1.14 tsearch.out
*** src/test/regress/expected/tsearch.out 16 May 2008 16:31:02 -0000 1.14
--- src/test/regress/expected/tsearch.out 18 Jul 2008 00:58:28 -0000
***************
*** 632,637 ****
--- 632,705 ----
</html>
(1 row)

+ --Check if headline fragments work
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+ We stuck, nor breath nor motion,
+ As idle as a painted Ship
+ Upon a painted Ocean.
+ Water, water, every where
+ And all the boards did shrink;
+ Water, water, every where,
+ Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'ocean'), 'MaxFragments=1');
+ ts_headline
+ ------------------------------------
+ ... after day,
+ We stuck, nor breath nor motion,
+ As idle as a painted Ship
+ Upon a painted <b>Ocean</b>.
+ Water, water, every where
+ And all the boards did shrink;
+ Water, water, every where,
+ Nor any drop
+ (1 row)
+
+ --Check if more than one fragments are displayed
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+ We stuck, nor breath nor motion,
+ As idle as a painted Ship
+ Upon a painted Ocean.
+ Water, water, every where
+ And all the boards did shrink;
+ Water, water, every where,
+ Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'Coleridge & stuck'), 'MaxFragments=2');
+ ts_headline
+ ---------------------------------------------
+ ... after day, day after day,
+ We <b>stuck</b>, nor breath nor motion,
+ As idle as a painted Ship
+ Upon a painted Ocean.
+ Water, water, every where
+ And all the boards did shrink;
+ Water, water, every where... drop to drink.
+ S. T. <b>Coleridge</b>
+ (1 row)
+
+ --Fragments when there all query words are not in the document
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+ We stuck, nor breath nor motion,
+ As idle as a painted Ship
+ Upon a painted Ocean.
+ Water, water, every where
+ And all the boards did shrink;
+ Water, water, every where,
+ Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'ocean & seahorse'), 'MaxFragments=1');
+ ts_headline
+ ------------------------------------
+
+ Day after day, day after day,
+ We stuck, nor breath nor motion,
+ As idle as
+ (1 row)
+
--Rewrite sub system
CREATE TABLE test_tsquery (txtkeyword TEXT, txtsample TEXT);
\set ECHO none
Fixed some off by one errors pointed by Oleg and errors in excluding
overlapping fragments.

Also adding test queries and updating regression tests.

Let me know of any other changes that are needed.

-Sushant.

On Thu, 2008-07-17 at 03:28 +0400, Oleg Bartunov wrote:
> On Wed, 16 Jul 2008, Sushant Sinha wrote:
>
> > I will add test queries and their results for the corner cases in a
> > separate file. I guess the only thing I am confused about is what should
> > be the behavior of headline generation when Query items have words of
> > size less than ShortWord. I guess the answer is to ignore ShortWord
> > parameter but let me know if the answer is any different.
> >
>
> ShortWord is about headline text, it doesn't affects words in query,
> so you can't discard them from query.
>
> > -Sushant.
> >
> > On Thu, 2008-07-17 at 02:53 +0400, Oleg Bartunov wrote:
> >> Sushant,
> >>
> >> first, please, provide simple test queries, which demonstrate the right work
> >> in the corner cases. This will helps reviewers to test your patch and
> >> helps you to make sure your new version is ok. For example:
> >>
> >> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery);
> >> ts_headline
> >> ------------------------------------------------------
> >> <b>1</b> 2 <b>3</b> 4 5 <b>1</b> 2 <b>3</b> <b>1</b>
> >>
> >> This select breaks your code:
> >>
> >> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery,'maxfragments=2');
> >> ts_headline
> >> --------------
> >> ... 2 ...
> >>
> >> and so on ....
> >>
> >>
> >> Oleg
> >> On Tue, 15 Jul 2008, Sushant Sinha wrote:
> >>
> >>> Attached a new patch that:
> >>>
> >>> 1. fixes previous bug
> >>> 2. better handles the case when cover size is greater than the MaxWords.
> >>> Basically it divides a cover greater than MaxWords into fragments of
> >>> MaxWords, resizes each such fragment so that each end of the fragment
> >>> contains a query word and then evaluates best fragments based on number of
> >>> query words in each fragment. In case of tie it picks up the smaller
> >>> fragment. This allows more query words to be shown with multiple fragments
> >>> in case a single cover is larger than the MaxWords.
> >>>
> >>> The resizing of a fragment such that each end is a query word provides room
> >>> for stretching both sides of the fragment. This (hopefully) better presents
> >>> the context in which query words appear in the document. If a cover is
> >>> smaller than MaxWords then the cover is treated as a fragment.
> >>>
> >>> Let me know if you have any more suggestions or anything is not clear.
> >>>
> >>> I have not yet added the regression tests. The regression test suite seemed
> >>> to be only ensuring that the function works. How many tests should I be
> >>> adding? Is there any other place that I need to add different test cases for
> >>> the function?
> >>>
> >>> -Sushant.
> >>>
> >>>
> >>> Nice. But it will be good to resolve following issues:
> >>>> 1) Patch contains mistakes, I didn't investigate or carefully read it. Get
> >>>> http://www.sai.msu.su/~megera/postgres/fts/apod.dump.gz<http://www.sai.msu.su/%7Emegera/postgres/fts/apod.dump.gz>and load in db.
> >>>>
> >>>> Queries
> >>>> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
> >>>> from apod where to_tsvector(body) @@ plainto_tsquery('black hole');
> >>>>
> >>>> and
> >>>>
> >>>> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
> >>>> from apod;
> >>>>
> >>>> crash postgresql :(
> >>>>
> >>>> 2) pls, include in your patch documentation and regression tests.
> >>>>
> >>>>
> >>>>> Another change that I was thinking:
> >>>>>
> >>>>> Right now if cover size > max_words then I just cut the trailing words.
> >>>>> Instead I was thinking that we should split the cover into more
> >>>>> fragments such that each fragment contains a few query words. Then each
> >>>>> fragment will not contain all query words but will show more occurrences
> >>>>> of query words in the headline. I would like to know what your opinion
> >>>>> on this is.
> >>>>>
> >>>>
> >>>> Agreed.
> >>>>
> >>>>
> >>>> --
> >>>> Teodor Sigaev E-mail: teodor@sigaev.ru
> >>>> WWW:
> >>>> http://www.sigaev.ru/
> >>>>
> >>>
> >>
> >> Regards,
> >> Oleg
> >> _____________________________________________________________
> >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> >> Sternberg Astronomical Institute, Moscow University, Russia
> >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> >> phone: +007(495)939-16-83, +007(495)939-23-83
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83

Re: [NOVICE] updating table where one date field is null

On Fri, 18 Jul 2008 08:34:14 am Stephan Szabo wrote:
> On Fri, 18 Jul 2008, richard terry wrote:
> > I'm passing some information to postges (from gambas basic), which gets
> > the data from date type
> >
> > eg date1 as date
> > date2 as date
> >
> > (IN the postgres database the default on the date field is null)
> >
> > my problem is that postgres baulks when I"ve not put a date into one of
> > the gambas date variables with the message:
> >
> > "invalid syntax for type date '00:00:00'"
> >
> > my sql is far more complex but reduced to the date part reads something
> > like this
> >
> > insert into mytable (date1, date2) values
> >
> > ('01/01/2008','00:00:00')
>
> '00:00:00' isn't a date. Are you sure that the second isn't meant to be a
> time?
Interesting observation, I'll ask on the gambas list. In gambas one declares a
date type as per the above, and the 00:00:00 is obviously the default of an
empty date for some reason (or as you say date/time). Thanks for replying, at
least this puts me on a track.

Regards

Richard

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: [HACKERS] [PATCH]-hash index improving

On Thu, Jul 17, 2008 at 7:37 PM, Dann Corbit <DCorbit@connx.com> wrote:

>> In which cases do we expect that hash indexes will beat btrees?
>
> Large table unique index equality search should be very fast with hashed
> index (and the only place where any advantage will be seen).

Yes, this is the exact use-case. Likewise, Dan has provided a good
description regarding the primary implementation goals of a disk-based
hash table.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]-hash index improving

On Thu, Jul 17, 2008 at 02:11:20PM -0400, Alvaro Herrera wrote:
> Kenneth Marshall escribió:
> > On Thu, Jul 17, 2008 at 12:42:39PM -0400, Alvaro Herrera wrote:
>
> > > I think having the HASHVALUE_ONLY define is not a good idea --
> > > it just makes the patch harder to read. I suggest just removing
> > > the old code and putting the new code in place. (That's why we
> > > have revision control.)
> > >
> > One thing it helps is building an old version and a new version
> > for comparative testing. Otherwise, you could end up with an
> > apples-to- oranges comparison. I certainly think that the final
> > patch should not have it, but it is useful now for testing and
> > comparisons.
>
> For this purpose I think it would be easier to have a separate tree
> with the patch, and one without it.

Here's one tree. Anyone can get an initial copy as follows:

git clone http://git.postgresql.org/git/~davidfetter/hash/.git

Xiao Meng, if you let me know where your git repo is, say by cloning
onto a machine I can see from the internet and applying your patches
to it, I can pull and let others see it :)

Yes, I know it's a little cumbersome, but we'll get something slicker
as we figure out what "slicker" really should mean.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers