Monday, July 7, 2008

Re: [HACKERS] PATCH: CITEXT 2.0

"David E. Wheeler" <david@kineticode.com> writes:
> On Jul 7, 2008, at 16:58, Tom Lane wrote:
>> If that's so, you certainly can't use strncmp, because that would
>> result
>> in sort orderings totally different from lower()'s result. Even
>> without
>> that argument, for most multibyte cases you'd get a pretty arbitrary,
>> user-unfriendly sort ordering.

> Now I'm confused again. :-( Whether or not I use strncmp() or
> varstr_cmp(), I first lowercase the value to be compared using
> str_tolower(). What Zdenek has said is, that aside, just as for the
> TEXT type, I should use strncmp() for = and <>, and varstr_cmp() for
> everything else. Are you saying something different?

No, but you were: you proposed using strncmp for everything.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data type OID numbers fixed?

Stephen R. van den Berg wrote:

> How large is the probability of these numbers ever changing?

Really small. I'd just hardcode them.


--
Alvaro Herrera

http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \SET QUIET and \timing

On Jul 7, 2008, at 17:14, Tom Lane wrote:

> "David E. Wheeler" <david@kineticode.com> writes:
>> Relatedly, this doesn't seem to turn off quiet mode:
>
>> \SET QUIET 0
>
> In current releases any value other than "off" (case insensitive)
> is taken as "true". IIRC this was improved recently.

Cute. Thanks.

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \SET QUIET and \timing

"David E. Wheeler" <david@kineticode.com> writes:
> Relatedly, this doesn't seem to turn off quiet mode:

> \SET QUIET 0

In current releases any value other than "off" (case insensitive)
is taken as "true". IIRC this was improved recently.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] [pg_gen] what was the "issue" with the 8.3.2 bundle ?

spellberg_robert <emailrob@emailrob.com> writes:
> i was wondering: what was it ?
> further, is this, per_chance, described, in excruciating detail,
> in a location where i did not tread ?

Did you read the 8.3.3 release notes?

http://www.postgresql.org/docs/8.3/static/release-8-3-3.html

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[HACKERS] Data type OID numbers fixed?

I'm in the process of finishing a native Protocol 3 PostgreSQL driver
written in Pike.
And while trying to make it make intelligent decisions regarding binary
or text encoding of different columns and parameters, I run into the
OIDs of types.

In looking through the specs, I decided that the following OIDtypes will
be (automatically) binary passed (both as results and as parameters),
all others will be passed as text:

case 16: // _bool
case 17: // _bytea
case 18: // _char
case 20: // _int8
case 21: // _int2
case 23: // _int4
case 25: // _text
case 142: // _xml
case 829: // _macaddr
case 869: // _inet
case 1042: // _bpchar
case 1043: // _varchar
case 1700: // _numeric
case 2950: // _uuid

How large is the probability of these numbers ever changing?
Should I extract them from a query at the beginning of the connection?
Or can I leave them hardcoded in the driver?
--
Sincerely,
Stephen R. van den Berg.

A truly wise man never plays leapfrog with a unicorn.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0

On Jul 7, 2008, at 16:58, Tom Lane wrote:

> "David E. Wheeler" <david@kineticode.com> writes:
>> Hrm. So in your opinion, strncmp() could be used for all comparisons
>> by citext, rather than varstr_cmp()?
>
> I thought the charter of this type was to work like lower(textcol).

Correct.

> If that's so, you certainly can't use strncmp, because that would
> result
> in sort orderings totally different from lower()'s result. Even
> without
> that argument, for most multibyte cases you'd get a pretty arbitrary,
> user-unfriendly sort ordering.

Now I'm confused again. :-( Whether or not I use strncmp() or
varstr_cmp(), I first lowercase the value to be compared using
str_tolower(). What Zdenek has said is, that aside, just as for the
TEXT type, I should use strncmp() for = and <>, and varstr_cmp() for
everything else. Are you saying something different?

I could use some examples to play with in order to ensure that things
are behaving as they should. I'll add regression tests for them.

Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0

"David E. Wheeler" <david@kineticode.com> writes:
> Hrm. So in your opinion, strncmp() could be used for all comparisons
> by citext, rather than varstr_cmp()?

I thought the charter of this type was to work like lower(textcol).
If that's so, you certainly can't use strncmp, because that would result
in sort orderings totally different from lower()'s result. Even without
that argument, for most multibyte cases you'd get a pretty arbitrary,
user-unfriendly sort ordering.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0 v2

No, *really*

Sheesh, sorry.

David

Re: [JDBC] server-side prepared Statements


protocolVersion=V2

Dave
On 7-Jul-08, at 11:39 AM, Thellmann, Uwe wrote:

Hi,
 
I have some trouble with prepared Statements with my postgres 8.1 using jdbc.
How can I prevent the server from using server-side prepared Statements ?
I'd like to disable those prepared Statements.
I have tried to set prepareThreshold=0 , but it doesn't work.
 
Who can help me?

***********
Hinweis zur Datensicherheit
Die Datenübertragung über das Internet erfolgt derzeit im Wesentlichen ungesichert. Die Vertraulichkeit sensibler, personenbezogener Daten, gegenüber Dritten ist nicht gewährleistet. Es ist nicht ausgeschlossen, dass übermittelte Daten von Unbefugten zur Kenntnis genommen und eventuell sogar verfälscht werden. Falls Sie vertrauliche Anfragen per E-Mail zusenden wollen, sollten Sie diese verschlüsseln. Wenn Sie Anfragen per E-Mail unverschlüsselt stellen, erklären Sie sich mit der unverschlüsselten Beantwortung per E-Mail einverstanden. Falls Sie dies nicht wünschen, unterrichten Sie uns. Sie erhalten die von Ihnen gewünschten Informationen dann auch gerne per Post oder Telefax übermittelt.

Vertraulichkeit der übermittelten Daten
Der Inhalt des erhaltenen E-Mails ist vertraulich zu behandeln und nur für den Adressaten/Vertreter bestimmt. Wir machen darauf aufmerksam, dass der E-Mail Inhalt aus Rechts- und Sicherheitsgründen nicht rechtsverbindlich ist. Eine rechtsverbindliche Bestätigung erhalten Sie gerne auf Anfrage in schriftlicher Form. Eine Veröffentlichung, Vervielfältigung oder Weitergabe des E-Mail-Inhaltes ist nur mit unserer schriftlichen Erlaubnis gestattet. Aussagen oder Informationen an den Adressaten unterliegen dem Recht des Geschäftes, zu welchem diese erfolgten; hierbei sind die zutreffenden Allgemeinen Geschäfts- oder Versicherungsbedingungen sowie individuelle Vereinbarungen zu beachten. Sollten Sie nicht der für unsere Nachricht vorgesehene Empfänger sein, so bitten wir Sie, sich mit dem Versender dieser E-Mail umgehend in Verbindung zu setzen und anschließend die empfangene Sendung aus Ihrem System zu löschen.

 

Re: [HACKERS] PATCH: CITEXT 2.0 v2

And here is the script. D'oh!

Thanks,

David

New relkind (was Re: [HACKERS] Exposing quals)

On Mon, Jul 07, 2008 at 06:46:29PM -0400, Andrew Dunstan wrote:
> >
> > On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote:
> >
> >> Please find attached the patch, and thanks to Neil Conway and
> >> Korry Douglas for the code, and to Jan Wieck for helping me
> >> hammer out the scheme above. Mistakes are all mine ;)
> >
> > I see no negative comments to this patch on -hackers.
> >
> > This was discussed here
> > http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED
> > and I had understood the consensus to be that we would go ahead
> > with this?
> >
> > The notes say "Heikki doesn't think this is a long term solution",
> > but in the following discussion it was the *only* way of doing
> > this that will work with non-PostgreSQL databases. So it seems
> > like the way we would want to go, yes?
> >
> > So, can we add this to the CommitFest July page so it can receive
> > some substantial constructive/destructive comments?
> >
> > This could be an important feature in conjunction with Hot
> > Standby.
>
> The notes say at the end:
>
> "Jan thinks that showing the node tree will work better. But others
> don't agree with him -- it wouldn't work for PL/perlU. But Jan
> thinks it would work to give it a pointer to the parse tree and the
> range, we'd need to add an access function for the PL."
>
> For the record, I agree with Jan's suggestion of passing a pointer
> to the parse tree, and offline gave David a suggestion verbally as
> to how this could be handled for PL/PerlU.
>
> I don't think we should be tied too closely to a string
> representation, although possibly the first and simplest callback
> function would simply stringify the quals.

As I understand Jan's plan, the idea is to create a new relkind with
an exit to user code at leaf nodes in the plan tree. This would
require an API design for both user C code and for each PL to use, but
would then allow PostgreSQL's optimizer to work on JOINs, etc.

Jan, have I got that right so far? Do you have something in the way
of a rough patch, docs, etc. for this?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0 v2

On Jul 7, 2008, at 08:01, Andrew Dunstan wrote:

> What does still bother me is its performance. I'd like to know if
> any measurement has been done of using citext vs. a functional index
> on lower(foo).

Okay, here's a start. The attached script inserts random strings of
1-10 space-delimited words into text and citext columns, and then
compares the performance of queries with and without indexes. The
output for me is as follows:

Loading words from dictionary.
Inserting into the table.

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 254.254 ms
SELECT * FROM try WHERE citext = 'food';
Time: 288.535 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.385 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 236.186 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 235.818 ms

Adding indexes...

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 1.260 ms
SELECT * FROM try WHERE citext = 'food';
Time: 277.755 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.073 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 238.430 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 238.685 ms
benedict%

So for some reason, after adding the indexes, the queries against the
CITEXT column aren't using them. Furthermore, the `lower(text) LIKE
lower(?)` query isn't using *its* index. Huh?

So this leaves me with two questions:

1. For what reason would the query against the citext column *not* use
the index?

2. Is there some way to get the CITEXT index to behave like a LOWER()
index, that is, so that its value is stored using the result of the
str_tolower() function, thus removing some of the overhead of
converting the values for each row fetched from the index? (Does this
question make any sense?)

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \SET QUIET and \timing

On Jul 7, 2008, at 16:07, David E. Wheeler wrote:

> Howdy,
>
> I find that, in psql, if I set
>
> \SET QUIET 1
>
> Then, even with \timing toggled on, no timing info is output.
>
> Relatedly, this doesn't seem to turn off quiet mode:
>
> \SET QUIET 0
>
> Are these bugs?

I found that \set QUIET (on|off) works, but still, \timing has no
output when quiet is on.

Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] \SET QUIET and \timing

Howdy,

I find that, in psql, if I set

\SET QUIET 1

Then, even with \timing toggled on, no timing info is output.

Relatedly, this doesn't seem to turn off quiet mode:

\SET QUIET 0

Are these bugs?

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] [pg_gen] what was the "issue" with the 8.3.2 bundle ?

I do believe this was discussed in hackers or thereabouts. Not that
familiar with it myself, but that's where I'd start my search

On Mon, Jul 7, 2008 at 1:51 PM, spellberg_robert <emailrob@emailrob.com> wrote:
> greetings, all ---
>
> in the announcement for the 8.3.3 bundle,
> about four weeks ago, now,
> it was stated that the 8.3.2 bundle had "issues",
> but, their nature was not described.
>
> it was stated that
> these tarballs were only available for "a few days".
>
> so, the cat in me being curious,
> today, i went looking in the -bugs list archives
> from may_30 to jun_13, by date,
> examining only the initial post for each thread [ to save time ].
> i find no entries that refer to a member of the bundle in question.
> i did the same for the -general list archives, from jun_13,
> but, i stopped on jun_05 when i found the post asking
> when 8.3.2 would be released.
>
> i was wondering: what was it ?
> further, is this, per_chance, described, in excruciating detail,
> in a location where i did not tread ?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Exposing quals

>
> On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote:
>
>> Please find attached the patch, and thanks to Neil Conway and Korry
>> Douglas for the code, and to Jan Wieck for helping me hammer out the
>> scheme above. Mistakes are all mine ;)
>
> I see no negative comments to this patch on -hackers.
>
> This was discussed here
> http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED
> and I had understood the consensus to be that we would go ahead with
> this?
>
> The notes say "Heikki doesn't think this is a long term solution", but
> in the following discussion it was the *only* way of doing this that
> will work with non-PostgreSQL databases. So it seems like the way we
> would want to go, yes?
>
> So, can we add this to the CommitFest July page so it can receive some
> substantial constructive/destructive comments?
>
> This could be an important feature in conjunction with Hot Standby.

The notes say at the end:

"Jan thinks that showing the node tree will work better. But others don't
agree with him -- it wouldn't work for PL/perlU. But Jan thinks it would
work to give it a pointer to the parse tree and the range, we'd need to
add an access function for the PL."

For the record, I agree with Jan's suggestion of passing a pointer to the
parse tree, and offline gave David a suggestion verbally as to how this
could be handled for PL/PerlU.


I don't think we should be tied too closely to a string representation,
although possibly the first and simplest callback function would simply
stringify the quals.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-es-ayuda] Hora del Servidor

Hugo César escribió:
> Hola Listeros!
>
> Como dice el titulo del mail el presente es para plantearles una
> cuestion, cual es la forma correcta de obtener la hora del servidor en
> base a la hora del PC, hago uso de las funciones destinadas para tal
> fin (CURRENT_TIME, LOCALTIMESTAMP...), pero todas me arrojan una
> diferencia de 5 horas respecto a la hora de la PC...

Seguramente esta identificando mal el huso horario de tu sistema
operativo, y por lo tanto hace un "fallback" a GMT. Prueba que pasa si
pones el nombre de tu huso horario en TimeZone en postgresql.conf
(supongo que podria ser America/Mexico_City u otra).

--
Alvaro Herrera

http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
--
TIP 4: No hagas 'kill -9' a postmaster

[pgsql-es-ayuda] Hora del Servidor

Hola Listeros!

Como dice el titulo del mail el presente es para plantearles una
cuestion, cual es la forma correcta de obtener la hora del servidor en
base a la hora del PC, hago uso de las funciones destinadas para tal
fin (CURRENT_TIME, LOCALTIMESTAMP...), pero todas me arrojan una
diferencia de 5 horas respecto a la hora de la PC...

He buscado en la lista de postgre y me encontre algo que dice que
verfique la variable TZ, esta variable se refiere a una variable de
entorno?? Estoy trabajando en Windows XP con PostgreSQL 8.2, ya cheque
las variables y no la encontre por ningun lado.

Estoy trabajando con la zona horario de México, investigando por
ahi me encontre con las funciones de tiempo, para lo cual les planteo
lo siguiente... no se si sea lo correcto o pueda definir desde alguna
parte mi zona horaria para no hacer este pequeño truquillo

SELECT to_char(CURRENT_TIMESTAMP - interval '5 hour','HH24:MI:SS') AS hora

Con esta pequeña sentencia me da la hora correcta del PC...


Otra cosa... Que valor es el que tengo que poner en la variable
timezone del archivo postgresql.conf ?? Lo tengo que dejar como
UNKNOWN ??

De antemano les agradezco cualquier comentario....
--
TIP 5: ¿Has leído nuestro extenso FAQ?

http://www.postgresql.org/docs/faqs.FAQ.html

Re: [PERFORM] Practical upper limits of pgbench read/write tps with 8.3

On Mon, 7 Jul 2008, Jeffrey Baker wrote:

> On the single 2.2GHz Athlon, the maximum tps seems to be 1450...what's
> the bottleneck? Is PG lock-bound?

It can become lock-bound if you don't make the database scale
significantly larger than the number of clients, but that's probably not
your problem. The pgbench client driver program itself is pretty CPU
intensive and can suffer badly from kernel issues. I am unsurprised you
can only hit 1450 with a single CPU. On systems with multiple CPUs where
the single CPU running the pgbench client is much faster than your 2.2GHz
Athlon, you'd probably be able to get a few thousand TPS, but eventually
the context switching of the client itself can become a bottleneck.

Running pgbench against a RAM disk is a good way to find out where the
system bottlenecks at without disk I/O involvement, you might try that
test on your larger server when you get a chance. One interesting thing
to watch you may not have tried yet is running top and seeing how close to
a single CPU pgbench itself is running at. If you've got 4 CPUs, and the
pgbench client program shows 25% utilization, it is now the bottleneck
rather than whatever you thought you were measuring. I thought this might
be the case in the last test results you reported on Friday but didn't
have a chance to comment on it until now.

One thing you can try here is running pgbench itself on another server
than the one hosting the database, but that seems to top out at a few
thousand TPS as well; may get higher than you've been seeing though.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[GENERAL] Query running slow but was running fine before


PG: PostgreSQL 8.1.4
OS: RHEL 4.x

I have a set of queries on a production server that have been running fine for the past few months but as of last Friday started performing poorly.  I have isolated the problem down to a particular part that is common to all queries involved and have provided an example here of what the issue is.  There are multiple tables where this is a problem, not just one.

An analyze is run on the tables every day (even several times a day because they are updated very frequently) and a vacuum analyze is run on the weekends.  I also tried to run an analyze specifically on the customer_id column and then the product_id column but that didn’t help.

This one table listed is part of a UNION ALL that joins ten separate tables.  The “real” query is against the view name of kda_log_info.  I am working with just one of the tables within the view to help narrow down where the problem might be.

There may be an occasion when the product_id value is null, so this is how the query is written.  It uses the index on customer_id and does a filter on product_id.  The cost is high and the time it takes to scan the 964 rows is about one minute on average (where before it would return in sub-second time):

kadams@hostserver> more t4.sql
select count(*)
from kda_log_info_2008w24
where customer_id = 7767664
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
and ((product_id IS NULL) OR (product_id = 2070101833));


kadams@hostserver> psql -Upostgres -dkda_log -ft4.sql
                                                                                      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=426537.34..426537.35 rows=1 width=0)
   ->  Index Scan using kda_log_info_cid_cre_dom_2008w24_idx on kda_log_info_2008w24  (cost=0.00..426535.75 rows=635 width=0)
         Index Cond: ((customer_id = 7767664) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: ((product_id IS NULL) OR (product_id = 2070101833))
(4 rows)


kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
 count
-------
   964
(1 row)


real    0m54.810s
user    0m0.002s
sys    0m0.001s


kda_log=# select count(*) from kda_log_info_2008w24;
  count   
----------
 16356303
(1 row)


Index:
kda_log_info_cid_cre_dom_2008w24_idx btree (customer_id, created, "domain")



On the same dataset, if I eliminate the IS NULL and look for just the specific product_id the optimizer picks the index on the product_id column and filters on customer_id and scans the same 964 rows in 2 milliseconds:


select count(*)
from kda_log_info_2008w24
where customer_id = 7767664
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
--and ((product_id IS NULL) OR (product_id = 2070101833))
and product_id = 2070101833;

kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
                                                                                      QUERY PLAN                                                                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1649.41..1649.42 rows=1 width=0)
   ->  Index Scan using kda_log_info_did_cre_2008w24_idx on kda_log_info_2008w24  (cost=0.00..1647.82 rows=635 width=0)
         Index Cond: ((product_id = 2070101833) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: (customer_id = 7767664)
(4 rows)


kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
 count
-------
   964
(1 row)


real    0m0.207s
user    0m0.004s
sys    0m0.002s



There are currently no rows in the table where product_id is NULL:


mxl_log=# select count(*) from kda_log_info_2008w24 where product_id IS NULL;  
-------
     0
(1 row)


Index:
kda_log_info_did_cre_2008w24_idx" btree (domain_id, created)




As another comparison I ran this same query on a different database server with the same database layout (different data set, of course) which returns the same approximate number of rows for the given customer_id / product_id and it returns in sub-second time. This table has even more rows in it than on the server where performance has tanked:


kadams@hostserver2> more t2.sql
select count(*)
from kda_log_info_2008w24
where customer_id = 907
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
and ((product_id IS NULL) OR (product_id = 573351));


kadams@hostserver2> psql -Upostgres -dkda_log -ft2.sql
                                                                                    QUERY PLAN                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2626.36..2626.37 rows=1 width=0)
   ->  Index Scan using kda_log_info_cid_cre_dom_2008w24_idx on kda_log_info_2008w24  (cost=0.00..2626.35 rows=1 width=0)
         Index Cond: ((customer_id = 907) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: ((product_id IS NULL) OR (product_id = 573351))
(4 rows)


kadams@hostserver2> time psql -Upostgres -dmxl_log -ft2.sql
 count
-------
   992
(1 row)


real    0m0.011s
user    0m0.002s
sys    0m0.002s


kda_log=# select count(*) from kda_log_info_2008w24;
  count   
----------
 21777364
(1 row)



I compared postgresql.conf file settings between the servers and they are identical.  

The physical servers are also the same configuration (8 CPU, 8 GB RAM, local fast-SCSI disk).

So, I have no idea why the query in the top of this email on this particular database server has gone from sub-second response to over a minute on average.  As only part of the overall query, times have gone from a few seconds to 15-20 minutes each, which is causing major problems for our users.

Any idea what else I can look at or do to resolve this problem?  Any additional information I can provide to help you guys figure this out?


PS: Yes, I know..... 8.1 is old..... We are migrating to 8.3.x in the fall.


Thanks for your help,

Keaton











Re: [HACKERS] the un-vacuumable table

On Mon, Jul 7, 2008 at 2:08 PM, Andrew Hammond
<andrew.george.hammond@gmail.com> wrote:
> On Mon, Jul 7, 2008 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, I suggest filing a bug with your kernel distributor --- ENOSPC was
>> a totally misleading error code here. Seems like EIO would be more
>> appropriate. They'll probably want to see the kernel log.
>
> It's FreeBSD 6.2 (yes, I know it's EoL). I'll submit a bug including this email.

http://www.freebsd.org/cgi/query-pr.cgi?pr=125382


Andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

Tom Lane wrote:
> target would typically be around 10 or so. It really wasn't designed to
> be industrial-strength code. (It was still a big improvement over what
> we'd had before :-(.) So I'm not very comfortable with taking it as the
> design base for something that does need to be industrial-strength.

I did some googling and found a paper describing a method of determining
most frequent elements from a stream called Lossy Counting:
http://www.vldb.org/conf/2002/S10P03.pdf
The following is an adaptation of an algorithm described in section 4.2
of that paper. I'll summarize it and propose an implementation using
existing PG data structures.

The data structure (called D) is a set of entries in the form (e, f, d),
where e is the lexeme, f is an integer representing the estimated
frequency and d is the maximum possible error in f. We process tsvectors
in batches. After each w tsvectors we will be pruning our data
structure, deleting entries that are not frequent enough. Let b be the
number of the current batch (tsvectors 1 to w are processed with b = 1,
tsvectors w+1 to w*2 are processed with b = 2 and so on). Observe that w
is an arbitrary number.
Initially, D is empty. For each lexeme e we look up its entry in D. If
it's found, we increment f in that entry. If not, we add a new entry in
the form of (e, 1, b - 1).
After each w tsvectors we delete all entries from D, that have f + d <= b.

For example, consider such a list of tsvectors (numbers stand for
lexemes, each line is a tsvector):
1 2 3
2 3 4 5
1 2
1 3 4
2 3 4 5
2 4 5 6

Let w = 3.
After processing the first tsvector D would look like so:
(1, 1, 0), (2, 1, 0), (3, 1, 0)
After processing the second tsvector D would be:
(1, 1, 0), (2, 2, 0), (3, 2, 0), (4, 1, 0), (5, 1, 0)
After the third tsvector we'd have:
(1, 2, 0), (2, 3, 0), (3, 2, 0), (4, 1, 0), (5, 1, 0)

And now we'd do the pruning. b = 1, so all entries with f + d <= 1 are
discarded. D is then:
(1, 2, 0), (2, 3, 0), (3, 2, 0)

After processing the fourth tsvector we get:
(1, 3, 0), (2, 3, 0), (3, 3, 0), (4, 1, 1)
And then:
(1, 3, 0), (2, 4, 0), (3, 4, 0), (4, 2, 1), (5, 1, 1)
And finally:
(1, 3, 0), (2, 5, 0), (3, 4, 0), (4, 3, 1), (5, 2, 1), (6, 1, 1)

Now comes the time for the second pruning. b = 2, to D is left with:
(1, 3, 0), (2, 5, 0), (3, 4, 0), (4, 3, 1), (5, 2, 1)

Finally we return entries with the largest values of f, their number
depending on statistics_target. That ends the algorithm.

The original procedure from the paper prunes the entries after each
"element batch" is processed, but the batches are of equal size. With
tsvectors it seems sensible to divide lexemes into batches of different
sizes, so the pruning always takes place after all the lexemes from a
tsvector have been processed. I'm not totally sure if that change does
not destroy some properties of that algorithm, but it looks reasonably sane.

As to the implementation: like Tom suggested, we could use a hashtable
as D and an array of pointers to the hashtable entries, that would get
updated each time a new hashtable entry would be added. Pruning would
then require a qsort() of that table taking (f + d) as the sort key, but
other than that we could just add new entries at the end of the array.
An open question is: what size should that array be? I think that by
knowing how long is the longest tsvector we could come up with a good
estimate, but this needs to be thought over. Anyway, we can always
repalloc().

Getting the max tsvector length by looking at the datum widths would be
a neat trick. I just wonder: if we have to detoast all these tsvectors
anyway, is it not possible to do that and do two passes over already
detoasted datums? I've got to admit I don't really understand how
toasting works, so I'm probably completely wrong on that.

Does this algorithm sound sensible? I might have not understood some
things from that publication, they give some proofs of why their
algorithm yields good results and takes up little space, but I'm not
all that certain it can be so easily adapted.

If you think the Lossy Counting method has potential, I could test it
somehow. Using my current work I could extract a stream of lexemes as
ANALYZE sees it and run it through a python implementation of the
algorithm to see if the result makes sense.

This is getting more complicated, than I thought it would :^)

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing quals

On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote:

> Please find attached the patch, and thanks to Neil Conway and Korry
> Douglas for the code, and to Jan Wieck for helping me hammer out the
> scheme above. Mistakes are all mine ;)

I see no negative comments to this patch on -hackers.

This was discussed here
http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED
and I had understood the consensus to be that we would go ahead with
this?

The notes say "Heikki doesn't think this is a long term solution", but
in the following discussion it was the *only* way of doing this that
will work with non-PostgreSQL databases. So it seems like the way we
would want to go, yes?

So, can we add this to the CommitFest July page so it can receive some
substantial constructive/destructive comments?

This could be an important feature in conjunction with Hot Standby.

--
Simon Riggs

www.2ndQuadrant.com

PostgreSQL Training, Services and Support


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-es-ayuda] Migrar una base de LATIN1 a UTF

Germán C. Basisty escribió:

> Tengo un cliente que desde hace un tiempo corre una base de datos LATIN1 en
> PostgreSQL 8.0, y deseo migrar el motor a 8.3, pero cambiando el encodig de
> dicha base a UTF.
>
> Cuál sería la forma elegante de realizar esta maniobra? Backup de la base y
> luego restore no funciona bien, muchos símbolos del idioma español son
> interpretados de forma incorrecta.

Haz el pg_dump normalmente (en Latin1), y luego

PGOPTIONS="-c client_encoding=latin1" pg_restore el-dump-de-tu-base.dump

Creo que eso debería funcionar.

(Obviamente, al hacer initdb en 8.3 se debió haber escogido una
configuración regional en utf8)

--
Alvaro Herrera

http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
--
TIP 2: puedes desuscribirte de todas las listas simultáneamente
(envía "unregister TuDirecciónDeCorreo" a majordomo@postgresql.org)

Re: [pgsql-es-ayuda] Migrar una base de LATIN1 a UTF

Germán C. Basisty wrote:
>
> Estimados colegas:
>
>
>
> Tengo un cliente que desde hace un tiempo corre una base de datos
> LATIN1 en PostgreSQL 8.0, y deseo migrar el motor a 8.3, pero
> cambiando el encodig de dicha base a UTF.
>
> Cuál sería la forma elegante de realizar esta maniobra? Backup de la
> base y luego restore no funciona bien, muchos símbolos del idioma
> español son interpretados de forma incorrecta.
>
> Saludos,
>
>
>
> *Germán C. Basisty*
>
> *Estudio Informático Patagónico*
>
> *Consultor - Tecnología Informática*
>
> *_german.basisty@eipsistemas.ath.cx <noelia.borda@eipsistemas.ath.cx>_*
>
> *_http://www.eipsistemas.ath.cx <http://www.eipsistemas.ath.cx/>_*
>
>
>
Usa set client_encoding en la terminal y haces la importacion, asi
postgresql se encarga de hacer la conversion hacia UTF8.

--
Saludos,

Julio Cesar Sánchez González.

--
Ahora me he convertido en la muerte, destructora de mundos.
Soy la Muerte que se lleva todo, la fuente de las cosas que vendran.

www.sistemasyconectividad.com.mx

http://darkavngr.blogspot.com/

--
TIP 7: no olvides aumentar la configuración del "free space map"

[HACKERS] 8.1 index corruption woes

#!/usr/bin/perl

use warnings;
use strict;

my $PG_FILEDUMP = "pg_filedump-8.1";

process_table();
exit();

sub process_table {
my ($table, @indexes) = read_toc();

foreach my $elem ($table, @indexes) {
die "file $elem does not exist" unless -f "$elem";
if (!-f "$elem.dump") {
print STDERR "generating pg_filedump for $elem\n";
`$PG_FILEDUMP -i $elem > $elem.dump`;
}
}

print STDERR "loading unused line pointers for table $table\n";
my $unused = get_heap_unused($table);

foreach my $index (@indexes) {
print STDERR "processing index $index\n";
process_index($unused, $index);
}
}

# Reads a "toc" file, which is a description of a table and its indexes. A
# table line is "table: xxx" where xxx is the relfilenode of the table, and an
# index line is "index: xxx" where xxx is the relfilenode of the index.
sub read_toc {
my $table;
my @indexes;

print STDERR "now expecting a TOC in stdin...\n";
while (<>) {
if (/^table: ([0-9]+)$/) {
$table = $1;
}
if (/^index: ([0-9]+)$/) {
push @indexes, $1;
}
}
print STDERR "finished reading the TOC for table $table\n";
print STDERR ("indexes: ", join(", ", @indexes), "\n");
return $table, @indexes;
}

# Reads a pg_filedump -i report for the given table, and fetches into a hash
# all the line pointers that are marked "unused". Returns the hash so built.
sub get_heap_unused {
my $table = shift;
my $curblock;
my $unused = {};

my $infile = "$table.dump";
open IN, "<", "$infile" or die "can't open $infile: $!";
while (<IN>) {
if (/^Block\s+([0-9]+) /) {
$curblock = $1;
}

if (/^ Item\s+([0-9]+) -- Length:.*Flags: 0x00/) {
push @{$unused->{$curblock}}, $1;
}
}

return $unused;
}

# Given an index and a hash built by get_heap_unused, reads the pg_filedump -i
# report for that index and prints a list of all index pointers that point to
# any element in the hash.
sub process_index {
my $unused = shift;
my $index = shift;

my $curblock;
my $special = 0;
my $isleaf = 0;
my $collect = 0;
my @lines = ();

my $infile = "$index.dump";

open IN, "<", "$infile" or die "can't open $infile: $!";
while (<IN>) {
if (/^Block\s+([0-9]+) /) {
$curblock = $1;
}
if (/^<Data> --/) {
$collect = 1;
next;
}

if (/^<Special Section> -----$/) {
$special = 1;
next;
}

if ($collect) {
push @lines, $_;
}

if ($special) {
if (/^ Flags.*LEAF/) {
$isleaf = 1;
}
}

if (/^$/) {
if ($special) {
if ($isleaf) {
report_broken_block($unused,
$index, $curblock, @lines);
}
$isleaf = 0;
$special = 0;
@lines = ();
}
}
}
}

# workhorse for process_index; gets an array of lines comprising an index
# block Data entries, and prints those that match the unused hash.
sub report_broken_block {
my $unused = shift;
my $indexid = shift;
my $block = shift;
my $item;

foreach my $line (@_) {
if ($line =~ /^ Item\s+([0-9]+)/) {
$item = $1;
}
if ($line =~ /^ Block Id:\s+([0-9]+)\s+linp Index: ([0-9]+)/) {
my $blk = $1;
my $lp = $2;

next unless defined $unused->{$blk};
foreach my $hlp (@{$unused->{$blk}})
{
next unless $lp eq $hlp;

print "INDEX PTR TO UNUSED HEAP: ".
"index $indexid ($block,$item) -> ($blk, $lp)\n";
}
}
}
}
Hi,

We've detected what I think is some sort of index corruption in 8.1.
The server is running 8.1.11, so AFAICT the problem with truncated pages
in vacuum is already patched and accounted for (i.e. we reindexed, and a
bit later the problem presented itself again). There haven't been any
relevant fixes after that AFAICT.

What we see is that after a bit of updating the index, it starts having
tuples that poing to heap entries which are marked unused.

I detected one of these almost by accident, and then built a tool to
discover them by examining pg_filedump output. Basically what it does
is scan the heap, note which heap tuples are marked "unused", and then
scan the index and for each index tuple in leaf pages, see if it points
to an unused heap tuple. The number of occurrences is amazingly high.
Right after a reindex there isn't any occurrence; but after a while of
application load, a lot of them appear.

I catalogued this as index corruption: the theory is that as soon as the
unused heap tuple is reused, the index will have a pointer with the
wrong index key pointing to a live heap tuple.

(We saw an occurrence of this too, but I wasn't motivated enough to
write a tool to verify the data in index vs. heap tuples.)

However, seeing the high prevalence of the problem, I started
considering whether the tool is misreading the output anyway -- i.e.
maybe it's a bug in the tool, or a misconception on my part.

The tool output looks like this:

INDEX PTR TO UNUSED HEAP: index 273375 (78,18) -> (5530, 17)
INDEX PTR TO UNUSED HEAP: index 273375 (96,84) -> (5436, 3)
INDEX PTR TO UNUSED HEAP: index 273375 (111,1) -> (1317, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (145,1) -> (1665, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (174,1) -> (2656, 17)
INDEX PTR TO UNUSED HEAP: index 273375 (199,1) -> (1745, 21)
INDEX PTR TO UNUSED HEAP: index 273375 (207,1) -> (358, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (214,1) -> (2800, 17)
(many more lines of the same stuff)

What this means (first line) is that on index 273375, page 78, offset
18, there's a pointer to heap page 5530, offset 17; but that heap offset
shows up as unused. This is from the heap:

Block 5530 ********************************************************
<Header> -----
Block Offset: 0x02b34000 Offsets: Lower 136 (0x0088)
Block: Size 8192 Version 3 Upper 992 (0x03e0)
LSN: logid 130 recoff 0xd53b3090 Special 8192 (0x2000)
Items: 29 Free Space: 856
Length (including item array): 140

[ ... snip other items ...]

Item 17 -- Length: 0 Offset: 504 (0x01f8) Flags: 0x00

This is from the index:

Block 78 ********************************************************
<Header> -----
Block Offset: 0x0009c000 Offsets: Lower 788 (0x0314)
Block: Size 8192 Version 3 Upper 5104 (0x13f0)
LSN: logid 131 recoff 0x01b3c6e0 Special 8176 (0x1ff0)
Items: 192 Free Space: 4316
Length (including item array): 792

[ ... snip other items ... ]

Item 18 -- Length: 16 Offset: 5184 (0x1440) Flags: USED
Block Id: 5530 linp Index: 17 Size: 16
Has Nulls: 0 Has Varwidths: 0


Of course, we're very concerned about this problem. Personally I am
prepared to believe that this could be a hardware problem, because no
one else seems to be reporting this kind of thing, but I don't see how
it could cause this particular issue and not more wide-ranging data
corruption.

I would like comments on whether this is really a problem or I am just
misreading pg_filedump output. If we confirm this, we can try to
proceed to investigate it further.

I attach the checker tool in case it is of any interest.

Thanks,

--
Alvaro Herrera

http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [SQL] how to control the execution plan ?

On Mon, Jul 7, 2008 at 3:14 AM, Sabin Coanda <sabin.coanda@deuromedia.ro> wrote:
> Hi there,
>
> I try to execute the following statement:
>
> SELECT *
> FROM (
> SELECT "MY_FUNCTION_A"(bp."COL_A", NULL::boolean) AS ALIAS_A
> FROM "TABLE_A" bp
> JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
> JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
> WHERE pn."Editor"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL AND
> bp."COL_A"::text <> ''::text
> ) x
> WHERE (x.ALIAS_A::text ) IS NULL;
>
> The problem is the excution plan first make Seq Scan on "TABLE_A", with
> Filter: (("COL_A" IS NOT NULL) AND (("COL_A")::text <> ''::text) AND
> (("MY_FUNCTION_A"("COL_A", NULL::boolean))::text IS NULL))". This way,
> MY_FUNCTION_A crashes for some unsupported data provided by "COL_A".
>
> I'd like to get an execution plan which is filtering first the desired rows,
> and just after that compute te column value "MY_FUNCTION_A"(bp."COL_A",
> NULL::boolean).

Just wondering what the query plans look like here, both regular
explain, and if you can wait for it to execute, explain analyze.

I'm guessing that the function is not indexed / indexable. Is it
marked immutable (and is it actually immutable) or stable (and is
stable)?

If it's immutable then you can create an index on it and that should
speed things up.

>
> I made different combinations, including a subquery like:
>
> SELECT *
> FROM (
> SELECT "MY_FUNCTION_A"(y."COL_A", NULL::boolean) AS ALIAS_A
> FROM (
> SELECT bp."COL_A"
> FROM "TABLE_A" bp
> JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
> JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
> WHERE pn."COL_E"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL
> AND bp."COL_A"::text <> ''::text
> ) y
> ) x
> WHERE (x.ALIAS_A::text ) IS NULL;
>
> but postgres analyze is too 'smart' and optimize it as in the previous case,
> with the same Seq Scan on "TABLE_A", and with the same filter.
>
> I thought to change the function MY_FUNCTION_A, to support any argument
> data, but the even that another performance problem will be rised when the
> function will be computed for any row in join, even those that can be
> removed by other filter.
>
> Do you have a solution please ?

If it's still to smart, you can run two queries, one to pull the set
you want to work with from the custom function into a temp table, then
analyze it, then run the query against that.
Not an optimal solution, but it might be the fastest if you can't
index your function.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [HACKERS] the un-vacuumable table

On Mon, Jul 7, 2008 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Andrew Hammond" <andrew.george.hammond@gmail.com> writes:
>>> On Thu, Jul 3, 2008 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Have you looked into the machine's kernel log to see if there is any
>>>> evidence of low-level distress (hardware or filesystem level)?
>
>> Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
>> 0xffffffff929b9f88:6812 function 0
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): WRITE(16). CDB: 8a 0 0 0
>> 0 1 6c 99 9 c0 0 0 0 20 0 0
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): CAM Status: SCSI Status Error
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): SCSI Status: Check Condition
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): UNIT ATTENTION asc:29,0
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Power on, reset, or bus
>> device reset occurred
>> [etc]
>
>> I think this is a smoking gun.
>
> Yeah, sure looks like one. Time to replace that disk drive?

Well, that's an Apple XRaid device on the other side of an LSI
FiberChannel HBA. I'll see if it has any error messages in it's logs.

> Also, I suggest filing a bug with your kernel distributor --- ENOSPC was
> a totally misleading error code here. Seems like EIO would be more
> appropriate. They'll probably want to see the kernel log.

It's FreeBSD 6.2 (yes, I know it's EoL). I'll submit a bug including this email.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[PERFORM] Practical upper limits of pgbench read/write tps with 8.3

I'm spending a third day testing with the ioDrive, and it occurred to
me that I should normalize my tests by mounting the database on a
ramdisk. The results were surprisingly low. On the single 2.2GHz
Athlon, the maximum tps seems to be 1450. This is achieved with a
single connection. I/O rates to and from the ramdisk never exceed
50MB/s on a one-minute average.

With the flash device on the same benchmark, the tps rate is 1350,
meaning that as far as PostgreSQL is concerned, on this machine, the
flash device achieves 90% of the best possible performance.

Question being: what's the bottleneck? Is PG lock-bound?

-jwb

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] PATCH: CITEXT 2.0

On Jul 7, 2008, at 13:59, Gregory Stark wrote:

> Of course the obvious case of two equivalent strings with different
> bytes
> would be two strings which differ only in case in a collation which
> doesn't
> distinguish based on case. So you obviously can't take this route
> for citext.

Well, to be fair, citext isn't imposing a collation. It's just calling
str_tolower() on strings before passing them on to varstr_cmp() or
strncmp() to compare.

> I don't think you have to worry about the problem that cause
> Postgres to make
> this change. IIRC it was someone comparing strings like paths and
> usernames
> and getting false positives because they were in a Turkish locale
> which found
> certain sequences of characters to be insignificant for ordering.
> Someone
> who's using a citext data type has obviously decided that's
> precisely the kind
> of behaviour they want.

Hrm. So in your opinion, strncmp() could be used for all comparisons
by citext, rather than varstr_cmp()?

Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[pdxpug] Sign up for PDXPUG PgDay! July 20, 2008

If you have trouble viewing or submitting this form, you can fill it out online:
http://spreadsheets.google.com/viewform?key=paoTJ9uEi8vIqumUzwzrCAw&email=true

Sign up for PDXPUG PgDay! July 20, 2008

Please sign up for free admission to the after-party at the Gotham Tavern, and to reserve a tshirt.  We are requesting a $20 donation that will be given to Software in the Public Interest for attendance.  You can pay with check or cash when you arrive at the conference. 






Powered by Google Docs

Terms of Service - Additional Terms

Re: [HACKERS] PATCH: CITEXT 2.0

"David E. Wheeler" <david@kineticode.com> writes:

> On Jul 7, 2008, at 12:21, David E. Wheeler wrote:
>
>> My question is: why? Shouldn't they all use the same function for
>> comparison? I'm happy to dupe this implementation for citext, but I don't
>> understand it. Should not all comparisons be executed consistently?
>
> Let me try to answer my own question by citing this comment:
>
> /*
> * Since we only care about equality or not-equality, we can avoid all
> the
> * expense of strcoll() here, and just do bitwise comparison.
> */
>
> So, the upshot is that the = and <> operators are not locale-aware, yes? They
> just do byte comparisons. Is that really the way it should be? I mean, could
> there not be strings that are equivalent but have different bytes?

There could be strings that strcoll returns 0 for even though they're not
identical. However that caused problems in Postgres so we decided that only
equal strings should actually compare equal. So if strcoll returns 0 then we
do a bytewise comparison to impose an arbitrary ordering.

Of course the obvious case of two equivalent strings with different bytes
would be two strings which differ only in case in a collation which doesn't
distinguish based on case. So you obviously can't take this route for citext.

I don't think you have to worry about the problem that cause Postgres to make
this change. IIRC it was someone comparing strings like paths and usernames
and getting false positives because they were in a Turkish locale which found
certain sequences of characters to be insignificant for ordering. Someone
who's using a citext data type has obviously decided that's precisely the kind
of behaviour they want.

--
Gregory Stark
EnterpriseDB

http://www.enterprisedb.com

Ask me about EnterpriseDB's RemoteDBA services!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-es-ayuda] como definir tablas para que las consultas sean rapidas ?

Es por una cuestión de performance. NO dije que la normalización
tuviera que ver con los tipos de datos pero, obviamente sería poco
performante utilizar PK de char....
disculpas si se entendió mal-...

--
/*
## Emanuel CALVO FRANCO. ##
~-~- 3m4nuek ~-~-
*/
--
TIP 2: puedes desuscribirte de todas las listas simultáneamente
(envía "unregister TuDirecciónDeCorreo" a majordomo@postgresql.org)

Re: [GENERAL] To store and retrive image data in postgresql

On Jul 7, 2008, at 12:22 PM, aravind chandu wrote:
> I need to store an image in postgresql database and after
> that i need to retrive the image back.Can you please help me how to
> do this?

Assuming you mean an image as in a binary visual image (like a JPEG),
the data type you want is BYTEA. How to actually get BYTEA data into
and out of PostgreSQL depends on what client library and language you
are using.

If your application allows it, it is often far more efficient to
store a reference to the image in the database (such as a URI or file
path), rather than the image itself.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[COMMITTERS] pgsql: Fix estimate_num_groups() to assume that GROUP BY expressions

Log Message:
-----------
Fix estimate_num_groups() to assume that GROUP BY expressions yielding boolean
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL". Per gripe from Sam Mason.

Back-patch to all supported releases, since the behavior of
estimate_num_groups() hasn't changed all that much since 7.4.

Tags:
----
REL8_0_STABLE

Modified Files:
--------------
pgsql/src/backend/utils/adt:
selfuncs.c (r1.169.4.6 -> r1.169.4.7)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c?r1=1.169.4.6&r2=1.169.4.7)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[COMMITTERS] pgsql: Fix estimate_num_groups() to assume that GROUP BY expressions

Log Message:
-----------
Fix estimate_num_groups() to assume that GROUP BY expressions yielding boolean
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL". Per gripe from Sam Mason.

Back-patch to all supported releases, since the behavior of
estimate_num_groups() hasn't changed all that much since 7.4.

Tags:
----
REL8_1_STABLE

Modified Files:
--------------
pgsql/src/backend/utils/adt:
selfuncs.c (r1.191.2.5 -> r1.191.2.6)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c?r1=1.191.2.5&r2=1.191.2.6)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[COMMITTERS] pgsql: Fix estimate_num_groups() to assume that GROUP BY expressions

Log Message:
-----------
Fix estimate_num_groups() to assume that GROUP BY expressions yielding boolean
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL". Per gripe from Sam Mason.

Back-patch to all supported releases, since the behavior of
estimate_num_groups() hasn't changed all that much since 7.4.

Tags:
----
REL8_2_STABLE

Modified Files:
--------------
pgsql/src/backend/utils/adt:
selfuncs.c (r1.214.2.9 -> r1.214.2.10)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c?r1=1.214.2.9&r2=1.214.2.10)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[COMMITTERS] pgsql: Fix estimate_num_groups() to assume that GROUP BY expressions

Log Message:
-----------
Fix estimate_num_groups() to assume that GROUP BY expressions yielding boolean
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL". Per gripe from Sam Mason.

Back-patch to all supported releases, since the behavior of
estimate_num_groups() hasn't changed all that much since 7.4.

Tags:
----
REL7_4_STABLE

Modified Files:
--------------
pgsql/src/backend/utils/adt:
selfuncs.c (r1.147.2.5 -> r1.147.2.6)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c?r1=1.147.2.5&r2=1.147.2.6)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[COMMITTERS] pgsql: Fix estimate_num_groups() to assume that GROUP BY expressions

Log Message:
-----------
Fix estimate_num_groups() to assume that GROUP BY expressions yielding boolean
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL". Per gripe from Sam Mason.

Back-patch to all supported releases, since the behavior of
estimate_num_groups() hasn't changed all that much since 7.4.

Tags:
----
REL8_3_STABLE

Modified Files:
--------------
pgsql/src/backend/utils/adt:
selfuncs.c (r1.243 -> r1.243.2.1)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c?r1=1.243&r2=1.243.2.1)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[COMMITTERS] pgsql: Fix estimate_num_groups() to assume that GROUP BY expressions

Log Message:
-----------
Fix estimate_num_groups() to assume that GROUP BY expressions yielding boolean
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL". Per gripe from Sam Mason.

Back-patch to all supported releases, since the behavior of
estimate_num_groups() hasn't changed all that much since 7.4.

Modified Files:
--------------
pgsql/src/backend/utils/adt:
selfuncs.c (r1.249 -> r1.250)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/selfuncs.c?r1=1.249&r2=1.250)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[pgsql-es-ayuda] Migrar una base de LATIN1 a UTF

Estimados colegas:

 

Tengo un cliente que desde hace un tiempo corre una base de datos LATIN1 en PostgreSQL 8.0, y deseo migrar el motor a 8.3, pero cambiando el encodig de dicha base a UTF.

Cuál sería la forma elegante de realizar esta maniobra? Backup de la base y luego restore no funciona bien, muchos símbolos del idioma español son interpretados de forma incorrecta.

Saludos,

 

Germán C. Basisty

Estudio Informático Patagónico

Consultor - Tecnología Informática

german.basisty@eipsistemas.ath.cx

http://www.eipsistemas.ath.cx

 

[pgsql-de-allgemein] == Wöchentlicher PostgreSQL Newsletter - 06. Juli 2008 ==

Der Originalartikel befindet sich unter:
http://people.planetpostgresql.org/dfetter/index.php?/archives/180-PostgreSQL-Weekly-News-July-06-2008.html


== Wöchentlicher PostgreSQL Newsletter - 06. Juli 2008 ==

Das Juli-Commitfest hat begonnen. Jetzt aber reviewen :)
http://wiki.postgresql.org/wiki/CommitFest:2008-07

== PostgreSQL Produkt Neuigkeiten ==

MicroOLAP Database Designer 1.2.4 für PostgreSQL erschienen.
http://microolap.com/products/database/postgresql-designer/

== PostgreSQL Jobs im Juli ==

http://archives.postgresql.org/pgsql-jobs/2008-07/threads.php

== PostgreSQL Lokal ==

pgDay Portland ist am 20. Juli. genau vor der OSCON.
http://pugs.postgresql.org/node/400

PGCon Brazil 2008 wird vom 26.-27. September auf dem Unicamp in
Campinas stattfinden.
http://pgcon.postgresql.org.br/index.en.html

PGDay.IT 2008 wird am 17. und 18. Oktober in Prato stattfinden.
http://www.pgday.org/it/

== PostgreSQL in den News ==

Planet PostgreSQL: http://www.planetpostgresql.org/

General Bits, Archive und gelegentliche News Artikel:
http://www.varlena.com/GeneralBits/

Dieser wöchentliche PostgreSQL Newsletter wurde erstellt von David
Fetter.

Sende Neuigkeiten und Ankündigungen bis Sonntag, 15 Uhr Pazifischer
Zeit. Bitte sende englische Beiträge an david@fetter.org, deutsche an
pwn@pgug.de, italienische an pwn@itpug.org.

== Angewandte Patches ==

Heikki Linnakangas committed:

- Turn PGBE_ACTIVITY_SIZE into a GUC variable,
track_activity_query_size. As the buffer could now be a lot larger than
before, and copying it could thus be a lot more expensive than before,
use strcpy instead of memcpy to copy the query string, as was already
suggested in comments. Also, only copy the PgBackendStatus struct and
string if the slot is in use. Patch by Thomas Lee, with some changes
by me.

- Extend VacAttrStats to allow typanalyze functions to store statistic
values of different types than the underlying column. The capability
isn't yet used for anything, but will be required by upcoming patch
to analyze tsvector columns. Jan Urbanski

- In pgsql/src/bin/pg_dump/pg_dump.c, move volatility, language, etc.
modifiers before function body in the pg_dump output for CREATE
FUNCTION. This makes it easier to read especially if the function
body is long. Original idea and patch by Greg Sabino Mullane,
though this is a stripped down version of that.

Teodor Sigaev committed:

- ltree support for multibyte encodings. Patch was made by Weiping
(laser) He, with some editorization by me.

- In pgsql/src/backend/access/gin/ginscan.c, fix initialization of
GinScanEntryData.partialMatch

Bruce Momjian committed:

- Add psql TODO item: "Add option to wrap column values at whitespace
boundaries, rather than chopping them at a fixed width. Currently,
'wrapped' format chops values into fixed widths. Perhaps the word
wrapping could use the same algorithm documented in the W3C
specification."

- Add psql TODO item: "Add 'auto' expanded mode that outputs in
expanded format if "wrapped" mode can't wrap the output to the
screen width."

- Fix recovery.conf boolean variables to take the same range of string
values as postgresql.conf.

- Issue psql connection warnings on connection start and via \c, per
observation by David Fetter.

- Add to TODO: "Fix TRUNCATE ... RESTART IDENTITY so its affect on
sequences is rolled back on transaction abort."

- Add URL for TODO: "Add database and transaction-level triggers."

- In pgsql/doc/src/sgml/config.sgml, documentation patch by Kevin L.
McBride explaining GUC lock variables, which are available if
LOCK_DEBUG is defined.

- In pgsql/src/include/c.h, update source code comment about when to
use gettext_noop().

Tom Lane committed:

- Teach autovacuum how to determine whether a temp table belongs to a
crashed backend. If so, send a LOG message to the postmaster log,
and if the table is beyond the vacuum-for-wraparound horizon,
forcibly drop it. Per recent discussions. Perhaps we ought to
back-patch this, but it probably needs to age a bit in HEAD first.

- In pgsql/src/timezone/pgtz.c, fix identify_system_timezone() so that
it tests the behavior of the system timezone setting in the current
year and for 100 years back, rather than always examining years
1904-2004. The original coding would have problems distinguishing
zones whose behavior diverged only after 2004; which is a situation
we will surely face sometime, if it's not out there already. In
passing, also prevent selection of the dummy "Factory" timezone,
even if that's exactly what the system is using. Reporting time as
GMT seems better than that.

- In pgsql/src/backend/utils/misc/guc.c, remove GUC extra_desc strings
that are redundant with the enum value lists.

- In pgsql/src/backend/utils/adt/xml.c, fix transaction-lifespan
memory leak in xpath(). Report by Matt Magoffin, fix by Kris Jurka.

- Fix psql's \d and allied commands to work with all server versions
back to 7.4. Guillaume Lelarge, with some additional fixes by me.

- Add a function pg_get_keywords() to let clients find out the set of
keywords known to the SQL parser. Dave Page

- In pgsql/src/backend/utils/misc/guc.c, prevent integer overflows
during units conversion when displaying a GUC variable that has
units. Per report from Stefan Kaltenbrunner. Backport to 8.2. I
also backported my patch of 2007-06-21 that prevented comparable
overflows on the input side, since that now seems to have enough
field track record to be back-patched safely. That patch included
addition of hints listing the available unit names, which I did not
bother to strip out of it --- this will make a little more work for
the translators, but they can copy the translation from 8.3, and
anyway an untranslated hint is better than no hint.

Magnus Hagander committed:

- In pgsql/src/backend/utils/misc/guc.c, split apart
message_level_options into one set for server-side settings and one for
client-side, restoring the previous behaviour with different sort order
for the 'log' level. Also, remove redundant list of available options,
since the enum code will output it automatically.

- In pgsql/src/backend/utils/misc/guc.c, "debug" level was supposed to
be hidden, since it's just an alias for debug2.

- In pgsql/src/backend/port/win32_shmem.c, fix a couple of bugs in
win32 shmem name generation: 1. Don't cut off the prefix. With this
fix, it's again readable. 2. Properly store it in the Global
namespace as intended.

Joe Conway committed:

- When an ERROR happens on a dblink remote connection, take pains to
pass the ERROR message components locally, including using the
passed SQLSTATE. Also wrap the passed info in an appropriate
CONTEXT message. Addresses complaint by Henry Combrinck. Joe
Conway, with much good advice from Tom Lane.

Peter Eisentraut committed:

- In pgsql/src/test/regress/expected/prepare.out, clean up weird
whitespace. Separate patch to simplifiy the next change.

- Don't print the name of the database in psql \z.

- Don't refer to the database name "regression" inside the regression
test scripts, to allow running the test successfully with another
database name.

== Abgelehnte Patches (bis jetzt) ==

No one was disappointed this week :-)

== Eingesandte Patches ==

Simon Riggs sent in a patch which introduces a distinction between
hint bit setting and block dirtying, when such a distinction can
safely be made.

Simon Riggs sent in a patch which adds planner statistic hooks.

Tom Raney sent in another revision of his patch to allow EXPLAIN to
output XML.

Peter Eisentraut sent in a patch to let people set the name of the
regression test database on the command line.

Teodor Sigaev sent in revisions to the multicolumn and fast-insert
patches to GIN.

Dean Rasheed sent in a patch to add a debug_explain_plan GUC variable
which, when set to on, dumps the output of EXPLAIN ANALYZE to the
appropriate logging level.

Zdenek Kotala sent in three more revisions of his page macros cleanup
patch.

Garick Hamlin sent in a patch to support ident authentication when
using unix domain sockets on Solaris.

Simon Riggs sent in a bug fix for pg_standby per note from Ferenc
Felhoffer.

Simon Riggs sent in two revisions of a patch to document the
toggliness of certain psql commands.

Simon Riggs sent in a patch to pgbench which restricts vacuuming to
pgbench tables and changes a DELETE/VACUUM to a TRUNCATE.

--
Andreas 'ads' Scherbaum
Deutsche PostgreSQL User Group: http://www.pgug.de
DPWN: http://ads.wars-nicht.de/blog/categories/18-PWN


--
Sent via pgsql-de-allgemein mailing list (pgsql-de-allgemein@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-de-allgemein

Re: [HACKERS] PATCH: CITEXT 2.0

On Jul 7, 2008, at 12:46, Zdenek Kotala wrote:

>> So, the upshot is that the = and <> operators are not locale-aware,
>> yes? They just do byte comparisons. Is that really the way it
>> should be? I mean, could there not be strings that are equivalent
>> but have different bytes?
>
> Correct. The problem is complex. It works fine only for normalized
> string. But postgres now assume that all utf8 strings are normalized.

I see. So binary equivalence is okay, in that case.

> If you need to implement < <= >= > operators you need to use strcol
> which take care of locale collation.

Which varstr_cmp() does, I guess. It's what textlt uses, for example.

> See unicode collation algorithm http://www.unicode.org/reports/tr10/

Wow, that looks like a fun read.

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0

On Jul 7, 2008, at 13:10, Tom Lane wrote:

> We intentionally force such strings to be considered non-equal.
> See varstr_cmp, and if you like see the archives from back when
> that was put in.
>
> The = and <> operators are in fact consistent with the behavior of
> varstr_cmp (and had better be!); they're just optimized a bit.

Thank you, Tom. I'll post my updated patch shortly.

In the meantime, can anyone suggest an easy way to populate a table
full of random strings so that I can do a bit of benchmarking?

Thanks,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0

"David E. Wheeler" <david@kineticode.com> writes:
> So, the upshot is that the = and <> operators are not locale-aware,
> yes? They just do byte comparisons. Is that really the way it should
> be? I mean, could there not be strings that are equivalent but have
> different bytes?

We intentionally force such strings to be considered non-equal.
See varstr_cmp, and if you like see the archives from back when
that was put in.

The = and <> operators are in fact consistent with the behavior of
varstr_cmp (and had better be!); they're just optimized a bit.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] To store and retrive image data in postgresql

for hot_backup and restore check this

http://www.postgresql.org/docs/8.3/static/continuous-archiving.html

for logic backup (dump) use this

http://www.postgresql.org/docs/8.3/static/app-pgdump.html

--- On Mon, 7/7/08, aravind chandu <avin_friends@yahoo.com> wrote:

> From: aravind chandu <avin_friends@yahoo.com>
> Subject: [GENERAL] To store and retrive image data in postgresql
> To: pgsql-general@postgresql.org
> Date: Monday, July 7, 2008, 7:22 PM
> Hello,
>
>
>
> I need to store
> an image in postgresql database and after that i need to
> retrive the
> image back.Can you please help me how to do this?
>
>
>
> Thank You,
>
> Avinash.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] [pg_gen] what was the "issue" with the 8.3.2 bundle ?

greetings, all ---

in the announcement for the 8.3.3 bundle,
about four weeks ago, now,
it was stated that the 8.3.2 bundle had "issues",
but, their nature was not described.

it was stated that
these tarballs were only available for "a few days".

so, the cat in me being curious,
today, i went looking in the -bugs list archives
from may_30 to jun_13, by date,
examining only the initial post for each thread [ to save time ].
i find no entries that refer to a member of the bundle in question.
i did the same for the -general list archives, from jun_13,
but, i stopped on jun_05 when i found the post asking
when 8.3.2 would be released.

i was wondering: what was it ?
further, is this, per_chance, described, in excruciating detail,
in a location where i did not tread ?

keep up the good work.

please cc.

tia.

rob


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] PATCH: CITEXT 2.0

2008/7/7 David E. Wheeler <david@kineticode.com>:
> On Jul 7, 2008, at 12:36, Pavel Stehule wrote:
>
>>> * Does it need to be locale-aware or not?
>>
>> yes!
>
> texteq() in varlena.c does not seem to be.
>

it's case sensitive and it's +/- binary compare

Regards
Pavel Stehule

> Best,
>
> David
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0

David E. Wheeler napsal(a):
> On Jul 7, 2008, at 12:21, David E. Wheeler wrote:
>
>> My question is: why? Shouldn't they all use the same function for
>> comparison? I'm happy to dupe this implementation for citext, but I
>> don't understand it. Should not all comparisons be executed consistently?
>
> Let me try to answer my own question by citing this comment:
>
> /*
> * Since we only care about equality or not-equality, we can avoid
> all the
> * expense of strcoll() here, and just do bitwise comparison.
> */
>
> So, the upshot is that the = and <> operators are not locale-aware, yes?
> They just do byte comparisons. Is that really the way it should be? I
> mean, could there not be strings that are equivalent but have different
> bytes?

Correct. The problem is complex. It works fine only for normalized string. But
postgres now assume that all utf8 strings are normalized.

If you need to implement < <= >= > operators you need to use strcol which take
care of locale collation.

See unicode collation algorithm http://www.unicode.org/reports/tr10/

Zdenek


--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic

http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] Altering a column type w/o dropping views

On Mon, Jul 07, 2008 at 05:53:58PM +0800, Ow Mun Heng wrote:
> I'm going to alter a bunch a tables columns's data type and I'm being
> forced to drop a view which depends on the the colum.

Why is that a problem? If you keep your object definitions in files
(e.g., in a directory structure that's under revision control) then
you can write a deployment script like the following (to be executed
via psql):

BEGIN;
DROP VIEW view_name;
ALTER TABLE table_name ALTER column_name TYPE type_name;
\i views/view_name.sql
COMMIT;

--
Michael Fuhr

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] creating "a perfect sequence" column

On Mon, Jul 7, 2008 at 4:54 AM, Jack Brown <zidibik@yahoo.com> wrote:

> Thank you, i didn't know the term so when I'd searched the archives, i hadn't found much.

Here is the result of a discussion that came up a couple of years ago:

http://www.varlena.com/GeneralBits/130.php

I guess that german account laws require a gapless sequence for all documents.

> in the mean time i did some research for other rdbms and i guess, what i was looking for was more along the lines of e.g. the row_number() function of mssql [1]. fwiw this seems like the most efficient pagination solution, assuming, of course, the rdbms implements this functionality efficiently.
>
> [1]: http://www.davidhayden.com/blog/dave/archive/2005/12/30/2652.aspx

actually, this functionality is most commonly implemented in a
reporting utility. But it can be done using SQL.

--
Regards,
Richard Broersma Jr.

Visit the Los Angles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [pgsql-de-allgemein] Pattern Matching

Florian Aumeier wrote:
> wie kann ich bei Postgres in einem Pattern eine Zeichenfolge
> ausschließen?
>
> Als Beispiel zwei unterschiedliche URL. Die erste URL soll gematched
> werden, die zweite nicht:
>
> a) 'http://asbojesus.wordpress.com/2007/03/02/14/'
> b) 'http://feeds.wordpress.com/1.0/goreddit/globolibro.wordpress.com/319/'

Ich weiß nicht, ob es mit _einem_ Pattern geht.

Ich würde zwei nehmen:

(col ~ E'^http://[a-zA-Z0-9]*\.wordpress\.com') AND
(col !~ E'^http://[a-zA-Z0-9]*feeds.*\.wordpress\.com')

oder sowas in der Art.

Liebe Grüße,
Laurenz Albe

--
Sent via pgsql-de-allgemein mailing list (pgsql-de-allgemein@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-de-allgemein

Re: [GENERAL] creating "a perfect sequence" column

> On Sun, Jul 6, 2008 at 7:33 PM, Berend Tober wrote:
> > This question comes up a lot. A term used in prior
> discussions is "gapless
> > sequence".
> >
>
Thank you, i didn't know the term so when I'd searched the archives, i hadn't found much.

> > What would be really more interesting for discussion
> on this community forum
> > is a detailed description or your actual use case and
> requirements.
>
it's sort of a formal document management system. the assigned gapless numbers are frozen at the end of each month. until then, an authorized user will be able to delete a document. but ...

> reusing numbers
> already shown to
> a user is a recipe for a disaster. they write down the
> number, and
> two weeks later reference it, but it's not there.
>
... when I mentioned them (management) this, they accepted that this was not a really good idea after all, thanks a lot :) i will assign numbers when closing the month, so everything will be fine.

in the mean time i did some research for other rdbms and i guess, what i was looking for was more along the lines of e.g. the row_number() function of mssql [1]. fwiw this seems like the most efficient pagination solution, assuming, of course, the rdbms implements this functionality efficiently.

[1]: http://www.davidhayden.com/blog/dave/archive/2005/12/30/2652.aspx


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general