Monday, July 7, 2008

Re: [HACKERS] PATCH: CITEXT 2.0

"David E. Wheeler" <david@kineticode.com> writes:
> On Jul 7, 2008, at 16:58, Tom Lane wrote:
>> If that's so, you certainly can't use strncmp, because that would
>> result
>> in sort orderings totally different from lower()'s result. Even
>> without
>> that argument, for most multibyte cases you'd get a pretty arbitrary,
>> user-unfriendly sort ordering.

> Now I'm confused again. :-( Whether or not I use strncmp() or
> varstr_cmp(), I first lowercase the value to be compared using
> str_tolower(). What Zdenek has said is, that aside, just as for the
> TEXT type, I should use strncmp() for = and <>, and varstr_cmp() for
> everything else. Are you saying something different?

No, but you were: you proposed using strncmp for everything.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Data type OID numbers fixed?

Stephen R. van den Berg wrote:

> How large is the probability of these numbers ever changing?

Really small. I'd just hardcode them.

--
Alvaro Herrera

http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \SET QUIET and \timing

On Jul 7, 2008, at 17:14, Tom Lane wrote:

> "David E. Wheeler" <david@kineticode.com> writes:
>> Relatedly, this doesn't seem to turn off quiet mode:
>
>> \SET QUIET 0
>
> In current releases any value other than "off" (case insensitive)
> is taken as "true". IIRC this was improved recently.

Cute. Thanks.

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \SET QUIET and \timing

"David E. Wheeler" <david@kineticode.com> writes:
> Relatedly, this doesn't seem to turn off quiet mode:

> \SET QUIET 0

In current releases any value other than "off" (case insensitive)
is taken as "true". IIRC this was improved recently.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] [pg_gen] what was the "issue" with the 8.3.2 bundle ?

spellberg_robert <emailrob@emailrob.com> writes:
> i was wondering: what was it ?
> further, is this, per_chance, described, in excruciating detail,
> in a location where i did not tread ?

Did you read the 8.3.3 release notes?

http://www.postgresql.org/docs/8.3/static/release-8-3-3.html

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[HACKERS] Data type OID numbers fixed?

I'm in the process of finishing a native Protocol 3 PostgreSQL driver
written in Pike.
And while trying to make it make intelligent decisions regarding binary
or text encoding of different columns and parameters, I run into the
OIDs of types.

In looking through the specs, I decided that the following OIDtypes will
be (automatically) binary passed (both as results and as parameters),
all others will be passed as text:

case 16: // _bool
case 17: // _bytea
case 18: // _char
case 20: // _int8
case 21: // _int2
case 23: // _int4
case 25: // _text
case 142: // _xml
case 829: // _macaddr
case 869: // _inet
case 1042: // _bpchar
case 1043: // _varchar
case 1700: // _numeric
case 2950: // _uuid

How large is the probability of these numbers ever changing?
Should I extract them from a query at the beginning of the connection?
Or can I leave them hardcoded in the driver?
--
Sincerely,
Stephen R. van den Berg.

A truly wise man never plays leapfrog with a unicorn.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0

On Jul 7, 2008, at 16:58, Tom Lane wrote:

> "David E. Wheeler" <david@kineticode.com> writes:
>> Hrm. So in your opinion, strncmp() could be used for all comparisons
>> by citext, rather than varstr_cmp()?
>
> I thought the charter of this type was to work like lower(textcol).

Correct.

> If that's so, you certainly can't use strncmp, because that would
> result
> in sort orderings totally different from lower()'s result. Even
> without
> that argument, for most multibyte cases you'd get a pretty arbitrary,
> user-unfriendly sort ordering.

Now I'm confused again. :-( Whether or not I use strncmp() or
varstr_cmp(), I first lowercase the value to be compared using
str_tolower(). What Zdenek has said is, that aside, just as for the
TEXT type, I should use strncmp() for = and <>, and varstr_cmp() for
everything else. Are you saying something different?

I could use some examples to play with in order to ensure that things
are behaving as they should. I'll add regression tests for them.

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0

"David E. Wheeler" <david@kineticode.com> writes:
> Hrm. So in your opinion, strncmp() could be used for all comparisons
> by citext, rather than varstr_cmp()?

I thought the charter of this type was to work like lower(textcol).
If that's so, you certainly can't use strncmp, because that would result
in sort orderings totally different from lower()'s result. Even without
that argument, for most multibyte cases you'd get a pretty arbitrary,
user-unfriendly sort ordering.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0 v2

No, *really*

Sheesh, sorry.

David

Re: [JDBC] server-side prepared Statements

http://jdbc.postgresql.org/documentation/83/connect.html#connection-parameters

protocolVersion=V2

Dave

On 7-Jul-08, at 11:39 AM, Thellmann, Uwe wrote:

Hi,

I have some trouble with prepared Statements with my postgres 8.1 using jdbc.

How can I prevent the server from using server-side prepared Statements ?

I'd like to disable those prepared Statements.

I have tried to set prepareThreshold=0 , but it doesn't work.

Who can help me?
***********
Hinweis zur Datensicherheit
Die Datenübertragung über das Internet erfolgt derzeit im Wesentlichen ungesichert. Die Vertraulichkeit sensibler, personenbezogener Daten, gegenüber Dritten ist nicht gewährleistet. Es ist nicht ausgeschlossen, dass übermittelte Daten von Unbefugten zur Kenntnis genommen und eventuell sogar verfälscht werden. Falls Sie vertrauliche Anfragen per E-Mail zusenden wollen, sollten Sie diese verschlüsseln. Wenn Sie Anfragen per E-Mail unverschlüsselt stellen, erklären Sie sich mit der unverschlüsselten Beantwortung per E-Mail einverstanden. Falls Sie dies nicht wünschen, unterrichten Sie uns. Sie erhalten die von Ihnen gewünschten Informationen dann auch gerne per Post oder Telefax übermittelt.

Vertraulichkeit der übermittelten Daten
Der Inhalt des erhaltenen E-Mails ist vertraulich zu behandeln und nur für den Adressaten/Vertreter bestimmt. Wir machen darauf aufmerksam, dass der E-Mail Inhalt aus Rechts- und Sicherheitsgründen nicht rechtsverbindlich ist. Eine rechtsverbindliche Bestätigung erhalten Sie gerne auf Anfrage in schriftlicher Form. Eine Veröffentlichung, Vervielfältigung oder Weitergabe des E-Mail-Inhaltes ist nur mit unserer schriftlichen Erlaubnis gestattet. Aussagen oder Informationen an den Adressaten unterliegen dem Recht des Geschäftes, zu welchem diese erfolgten; hierbei sind die zutreffenden Allgemeinen Geschäfts- oder Versicherungsbedingungen sowie individuelle Vereinbarungen zu beachten. Sollten Sie nicht der für unsere Nachricht vorgesehene Empfänger sein, so bitten wir Sie, sich mit dem Versender dieser E-Mail umgehend in Verbindung zu setzen und anschließend die empfangene Sendung aus Ihrem System zu löschen.

Re: [HACKERS] PATCH: CITEXT 2.0 v2

And here is the script. D'oh!

Thanks,

David

New relkind (was Re: [HACKERS] Exposing quals)

On Mon, Jul 07, 2008 at 06:46:29PM -0400, Andrew Dunstan wrote:
> >
> > On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote:
> >
> >> Please find attached the patch, and thanks to Neil Conway and
> >> Korry Douglas for the code, and to Jan Wieck for helping me
> >> hammer out the scheme above. Mistakes are all mine ;)
> >
> > I see no negative comments to this patch on -hackers.
> >
> > This was discussed here
> > http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED
> > and I had understood the consensus to be that we would go ahead
> > with this?
> >
> > The notes say "Heikki doesn't think this is a long term solution",
> > but in the following discussion it was the *only* way of doing
> > this that will work with non-PostgreSQL databases. So it seems
> > like the way we would want to go, yes?
> >
> > So, can we add this to the CommitFest July page so it can receive
> > some substantial constructive/destructive comments?
> >
> > This could be an important feature in conjunction with Hot
> > Standby.
>
> The notes say at the end:
>
> "Jan thinks that showing the node tree will work better. But others
> don't agree with him -- it wouldn't work for PL/perlU. But Jan
> thinks it would work to give it a pointer to the parse tree and the
> range, we'd need to add an access function for the PL."
>
> For the record, I agree with Jan's suggestion of passing a pointer
> to the parse tree, and offline gave David a suggestion verbally as
> to how this could be handled for PL/PerlU.
>
> I don't think we should be tied too closely to a string
> representation, although possibly the first and simplest callback
> function would simply stringify the quals.

As I understand Jan's plan, the idea is to create a new relkind with
an exit to user code at leaf nodes in the plan tree. This would
require an API design for both user C code and for each PL to use, but
would then allow PostgreSQL's optimizer to work on JOINs, etc.

Jan, have I got that right so far? Do you have something in the way
of a rough patch, docs, etc. for this?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0 v2

On Jul 7, 2008, at 08:01, Andrew Dunstan wrote:

> What does still bother me is its performance. I'd like to know if
> any measurement has been done of using citext vs. a functional index
> on lower(foo).

Okay, here's a start. The attached script inserts random strings of
1-10 space-delimited words into text and citext columns, and then
compares the performance of queries with and without indexes. The
output for me is as follows:

Loading words from dictionary.
Inserting into the table.

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 254.254 ms
SELECT * FROM try WHERE citext = 'food';
Time: 288.535 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.385 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 236.186 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 235.818 ms

Adding indexes...

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 1.260 ms
SELECT * FROM try WHERE citext = 'food';
Time: 277.755 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.073 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 238.430 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 238.685 ms
benedict%

So for some reason, after adding the indexes, the queries against the
CITEXT column aren't using them. Furthermore, the `lower(text) LIKE
lower(?)` query isn't using *its* index. Huh?

So this leaves me with two questions:

1. For what reason would the query against the citext column *not* use
the index?

2. Is there some way to get the CITEXT index to behave like a LOWER()
index, that is, so that its value is stored using the result of the
str_tolower() function, thus removing some of the overhead of
converting the values for each row fetched from the index? (Does this
question make any sense?)

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \SET QUIET and \timing

On Jul 7, 2008, at 16:07, David E. Wheeler wrote:

> Howdy,
>
> I find that, in psql, if I set
>
> \SET QUIET 1
>
> Then, even with \timing toggled on, no timing info is output.
>
> Relatedly, this doesn't seem to turn off quiet mode:
>
> \SET QUIET 0
>
> Are these bugs?

I found that \set QUIET (on|off) works, but still, \timing has no
output when quiet is on.

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] \SET QUIET and \timing

Howdy,

I find that, in psql, if I set

\SET QUIET 1

Then, even with \timing toggled on, no timing info is output.

Relatedly, this doesn't seem to turn off quiet mode:

\SET QUIET 0

Are these bugs?

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] [pg_gen] what was the "issue" with the 8.3.2 bundle ?

I do believe this was discussed in hackers or thereabouts. Not that
familiar with it myself, but that's where I'd start my search

On Mon, Jul 7, 2008 at 1:51 PM, spellberg_robert <emailrob@emailrob.com> wrote:
> greetings, all ---
>
> in the announcement for the 8.3.3 bundle,
> about four weeks ago, now,
> it was stated that the 8.3.2 bundle had "issues",
> but, their nature was not described.
>
> it was stated that
> these tarballs were only available for "a few days".
>
> so, the cat in me being curious,
> today, i went looking in the -bugs list archives
> from may_30 to jun_13, by date,
> examining only the initial post for each thread [ to save time ].
> i find no entries that refer to a member of the bundle in question.
> i did the same for the -general list archives, from jun_13,
> but, i stopped on jun_05 when i found the post asking
> when 8.3.2 would be released.
>
> i was wondering: what was it ?
> further, is this, per_chance, described, in excruciating detail,
> in a location where i did not tread ?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] Exposing quals

>
> On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote:
>
>> Please find attached the patch, and thanks to Neil Conway and Korry
>> Douglas for the code, and to Jan Wieck for helping me hammer out the
>> scheme above. Mistakes are all mine ;)
>
> I see no negative comments to this patch on -hackers.
>
> This was discussed here
> http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED
> and I had understood the consensus to be that we would go ahead with
> this?
>
> The notes say "Heikki doesn't think this is a long term solution", but
> in the following discussion it was the *only* way of doing this that
> will work with non-PostgreSQL databases. So it seems like the way we
> would want to go, yes?
>
> So, can we add this to the CommitFest July page so it can receive some
> substantial constructive/destructive comments?
>
> This could be an important feature in conjunction with Hot Standby.

The notes say at the end:

"Jan thinks that showing the node tree will work better. But others don't
agree with him -- it wouldn't work for PL/perlU. But Jan thinks it would
work to give it a pointer to the parse tree and the range, we'd need to
add an access function for the PL."

For the record, I agree with Jan's suggestion of passing a pointer to the
parse tree, and offline gave David a suggestion verbally as to how this
could be handled for PL/PerlU.

I don't think we should be tied too closely to a string representation,
although possibly the first and simplest callback function would simply
stringify the quals.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-es-ayuda] Hora del Servidor

Hugo César escribió:
> Hola Listeros!
>
> Como dice el titulo del mail el presente es para plantearles una
> cuestion, cual es la forma correcta de obtener la hora del servidor en
> base a la hora del PC, hago uso de las funciones destinadas para tal
> fin (CURRENT_TIME, LOCALTIMESTAMP...), pero todas me arrojan una
> diferencia de 5 horas respecto a la hora de la PC...

Seguramente esta identificando mal el huso horario de tu sistema
operativo, y por lo tanto hace un "fallback" a GMT. Prueba que pasa si
pones el nombre de tu huso horario en TimeZone en postgresql.conf
(supongo que podria ser America/Mexico_City u otra).

--
Alvaro Herrera

http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
--
TIP 4: No hagas 'kill -9' a postmaster

[pgsql-es-ayuda] Hora del Servidor

Hola Listeros!

Como dice el titulo del mail el presente es para plantearles una
cuestion, cual es la forma correcta de obtener la hora del servidor en
base a la hora del PC, hago uso de las funciones destinadas para tal
fin (CURRENT_TIME, LOCALTIMESTAMP...), pero todas me arrojan una
diferencia de 5 horas respecto a la hora de la PC...

He buscado en la lista de postgre y me encontre algo que dice que
verfique la variable TZ, esta variable se refiere a una variable de
entorno?? Estoy trabajando en Windows XP con PostgreSQL 8.2, ya cheque
las variables y no la encontre por ningun lado.

Estoy trabajando con la zona horario de México, investigando por
ahi me encontre con las funciones de tiempo, para lo cual les planteo
lo siguiente... no se si sea lo correcto o pueda definir desde alguna
parte mi zona horaria para no hacer este pequeño truquillo

SELECT to_char(CURRENT_TIMESTAMP - interval '5 hour','HH24:MI:SS') AS hora

Con esta pequeña sentencia me da la hora correcta del PC...

Otra cosa... Que valor es el que tengo que poner en la variable
timezone del archivo postgresql.conf ?? Lo tengo que dejar como
UNKNOWN ??

De antemano les agradezco cualquier comentario....
--
TIP 5: ¿Has leído nuestro extenso FAQ?

http://www.postgresql.org/docs/faqs.FAQ.html

Re: [PERFORM] Practical upper limits of pgbench read/write tps with 8.3

On Mon, 7 Jul 2008, Jeffrey Baker wrote:

> On the single 2.2GHz Athlon, the maximum tps seems to be 1450...what's
> the bottleneck? Is PG lock-bound?

It can become lock-bound if you don't make the database scale
significantly larger than the number of clients, but that's probably not
your problem. The pgbench client driver program itself is pretty CPU
intensive and can suffer badly from kernel issues. I am unsurprised you
can only hit 1450 with a single CPU. On systems with multiple CPUs where
the single CPU running the pgbench client is much faster than your 2.2GHz
Athlon, you'd probably be able to get a few thousand TPS, but eventually
the context switching of the client itself can become a bottleneck.

Running pgbench against a RAM disk is a good way to find out where the
system bottlenecks at without disk I/O involvement, you might try that
test on your larger server when you get a chance. One interesting thing
to watch you may not have tried yet is running top and seeing how close to
a single CPU pgbench itself is running at. If you've got 4 CPUs, and the
pgbench client program shows 25% utilization, it is now the bottleneck
rather than whatever you thought you were measuring. I thought this might
be the case in the last test results you reported on Friday but didn't
have a chance to comment on it until now.

One thing you can try here is running pgbench itself on another server
than the one hosting the database, but that seems to top out at a few
thousand TPS as well; may get higher than you've been seeing though.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[GENERAL] Query running slow but was running fine before

PG: PostgreSQL 8.1.4
OS: RHEL 4.x

I have a set of queries on a production server that have been running fine for the past few months but as of last Friday started performing poorly. I have isolated the problem down to a particular part that is common to all queries involved and have provided an example here of what the issue is. There are multiple tables where this is a problem, not just one.

An analyze is run on the tables every day (even several times a day because they are updated very frequently) and a vacuum analyze is run on the weekends. I also tried to run an analyze specifically on the customer_id column and then the product_id column but that didn’t help.

This one table listed is part of a UNION ALL that joins ten separate tables. The “real” query is against the view name of kda_log_info. I am working with just one of the tables within the view to help narrow down where the problem might be.

There may be an occasion when the product_id value is null, so this is how the query is written. It uses the index on customer_id and does a filter on product_id. The cost is high and the time it takes to scan the 964 rows is about one minute on average (where before it would return in sub-second time):

kadams@hostserver> more t4.sql
select count(*)
from kda_log_info_2008w24
where customer_id = 7767664
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
and ((product_id IS NULL) OR (product_id = 2070101833));

kadams@hostserver> psql -Upostgres -dkda_log -ft4.sql
                                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=426537.34..426537.35 rows=1 width=0)
   -> Index Scan using kda_log_info_cid_cre_dom_2008w24_idx on kda_log_info_2008w24 (cost=0.00..426535.75 rows=635 width=0)
         Index Cond: ((customer_id = 7767664) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: ((product_id IS NULL) OR (product_id = 2070101833))
(4 rows)

kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
count
-------
   964
(1 row)

real    0m54.810s
user    0m0.002s
sys    0m0.001s

kda_log=# select count(*) from kda_log_info_2008w24;
  count
----------
16356303
(1 row)

Index:
kda_log_info_cid_cre_dom_2008w24_idx btree (customer_id, created, "domain")

On the same dataset, if I eliminate the IS NULL and look for just the specific product_id the optimizer picks the index on the product_id column and filters on customer_id and scans the same 964 rows in 2 milliseconds:

select count(*)
from kda_log_info_2008w24
where customer_id = 7767664
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
--and ((product_id IS NULL) OR (product_id = 2070101833))
and product_id = 2070101833;

kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
                                                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1649.41..1649.42 rows=1 width=0)
   -> Index Scan using kda_log_info_did_cre_2008w24_idx on kda_log_info_2008w24 (cost=0.00..1647.82 rows=635 width=0)
         Index Cond: ((product_id = 2070101833) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: (customer_id = 7767664)
(4 rows)

kadams@hostserver> time psql -Upostgres -dkda_log -ft4.sql
count
-------
   964
(1 row)

real    0m0.207s
user    0m0.004s
sys    0m0.002s

There are currently no rows in the table where product_id is NULL:

mxl_log=# select count(*) from kda_log_info_2008w24 where product_id IS NULL;
-------
     0
(1 row)

Index:
kda_log_info_did_cre_2008w24_idx" btree (domain_id, created)

As another comparison I ran this same query on a different database server with the same database layout (different data set, of course) which returns the same approximate number of rows for the given customer_id / product_id and it returns in sub-second time. This table has even more rows in it than on the server where performance has tanked:

kadams@hostserver2> more t2.sql
select count(*)
from kda_log_info_2008w24
where customer_id = 907
AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone)
AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone)
and ((product_id IS NULL) OR (product_id = 573351));

kadams@hostserver2> psql -Upostgres -dkda_log -ft2.sql
                                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2626.36..2626.37 rows=1 width=0)
   -> Index Scan using kda_log_info_cid_cre_dom_2008w24_idx on kda_log_info_2008w24 (cost=0.00..2626.35 rows=1 width=0)
         Index Cond: ((customer_id = 907) AND (created >= '2008-06-01 00:00:00-06'::timestamp with time zone) AND (created < '2008-07-01 00:00:00-06'::timestamp with time zone))
         Filter: ((product_id IS NULL) OR (product_id = 573351))
(4 rows)

kadams@hostserver2> time psql -Upostgres -dmxl_log -ft2.sql
count
-------
   992
(1 row)

real    0m0.011s
user    0m0.002s
sys    0m0.002s

kda_log=# select count(*) from kda_log_info_2008w24;
  count
----------
21777364
(1 row)

I compared postgresql.conf file settings between the servers and they are identical.

The physical servers are also the same configuration (8 CPU, 8 GB RAM, local fast-SCSI disk).

So, I have no idea why the query in the top of this email on this particular database server has gone from sub-second response to over a minute on average. As only part of the overall query, times have gone from a few seconds to 15-20 minutes each, which is causing major problems for our users.

Any idea what else I can look at or do to resolve this problem? Any additional information I can provide to help you guys figure this out?

PS: Yes, I know..... 8.1 is old..... We are migrating to 8.3.x in the fall.

Thanks for your help,

Keaton

Re: [HACKERS] the un-vacuumable table

On Mon, Jul 7, 2008 at 2:08 PM, Andrew Hammond
<andrew.george.hammond@gmail.com> wrote:
> On Mon, Jul 7, 2008 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, I suggest filing a bug with your kernel distributor --- ENOSPC was
>> a totally misleading error code here. Seems like EIO would be more
>> appropriate. They'll probably want to see the kernel log.
>
> It's FreeBSD 6.2 (yes, I know it's EoL). I'll submit a bug including this email.

http://www.freebsd.org/cgi/query-pr.cgi?pr=125382

Andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

Tom Lane wrote:
> target would typically be around 10 or so. It really wasn't designed to
> be industrial-strength code. (It was still a big improvement over what
> we'd had before :-(.) So I'm not very comfortable with taking it as the
> design base for something that does need to be industrial-strength.

I did some googling and found a paper describing a method of determining
most frequent elements from a stream called Lossy Counting:
http://www.vldb.org/conf/2002/S10P03.pdf
The following is an adaptation of an algorithm described in section 4.2
of that paper. I'll summarize it and propose an implementation using
existing PG data structures.

The data structure (called D) is a set of entries in the form (e, f, d),
where e is the lexeme, f is an integer representing the estimated
frequency and d is the maximum possible error in f. We process tsvectors
in batches. After each w tsvectors we will be pruning our data
structure, deleting entries that are not frequent enough. Let b be the
number of the current batch (tsvectors 1 to w are processed with b = 1,
tsvectors w+1 to w*2 are processed with b = 2 and so on). Observe that w
is an arbitrary number.
Initially, D is empty. For each lexeme e we look up its entry in D. If
it's found, we increment f in that entry. If not, we add a new entry in
the form of (e, 1, b - 1).
After each w tsvectors we delete all entries from D, that have f + d <= b.

For example, consider such a list of tsvectors (numbers stand for
lexemes, each line is a tsvector):
1 2 3
2 3 4 5
1 2
1 3 4
2 3 4 5
2 4 5 6

Let w = 3.
After processing the first tsvector D would look like so:
(1, 1, 0), (2, 1, 0), (3, 1, 0)
After processing the second tsvector D would be:
(1, 1, 0), (2, 2, 0), (3, 2, 0), (4, 1, 0), (5, 1, 0)
After the third tsvector we'd have:
(1, 2, 0), (2, 3, 0), (3, 2, 0), (4, 1, 0), (5, 1, 0)

And now we'd do the pruning. b = 1, so all entries with f + d <= 1 are
discarded. D is then:
(1, 2, 0), (2, 3, 0), (3, 2, 0)

After processing the fourth tsvector we get:
(1, 3, 0), (2, 3, 0), (3, 3, 0), (4, 1, 1)
And then:
(1, 3, 0), (2, 4, 0), (3, 4, 0), (4, 2, 1), (5, 1, 1)
And finally:
(1, 3, 0), (2, 5, 0), (3, 4, 0), (4, 3, 1), (5, 2, 1), (6, 1, 1)

Now comes the time for the second pruning. b = 2, to D is left with:
(1, 3, 0), (2, 5, 0), (3, 4, 0), (4, 3, 1), (5, 2, 1)

Finally we return entries with the largest values of f, their number
depending on statistics_target. That ends the algorithm.

The original procedure from the paper prunes the entries after each
"element batch" is processed, but the batches are of equal size. With
tsvectors it seems sensible to divide lexemes into batches of different
sizes, so the pruning always takes place after all the lexemes from a
tsvector have been processed. I'm not totally sure if that change does
not destroy some properties of that algorithm, but it looks reasonably sane.

As to the implementation: like Tom suggested, we could use a hashtable
as D and an array of pointers to the hashtable entries, that would get
updated each time a new hashtable entry would be added. Pruning would
then require a qsort() of that table taking (f + d) as the sort key, but
other than that we could just add new entries at the end of the array.
An open question is: what size should that array be? I think that by
knowing how long is the longest tsvector we could come up with a good
estimate, but this needs to be thought over. Anyway, we can always
repalloc().

Getting the max tsvector length by looking at the datum widths would be
a neat trick. I just wonder: if we have to detoast all these tsvectors
anyway, is it not possible to do that and do two passes over already
detoasted datums? I've got to admit I don't really understand how
toasting works, so I'm probably completely wrong on that.

Does this algorithm sound sensible? I might have not understood some
things from that publication, they give some proofs of why their
algorithm yields good results and takes up little space, but I'm not
all that certain it can be so easily adapted.

If you think the Lossy Counting method has potential, I could test it
somehow. Using my current work I could extract a stream of lexemes as
ANALYZE sees it and run it through a python implementation of the
algorithm to see if the result makes sense.

This is getting more complicated, than I thought it would :^)

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing quals

On Mon, 2008-05-05 at 12:01 -0700, David Fetter wrote:

> Please find attached the patch, and thanks to Neil Conway and Korry
> Douglas for the code, and to Jan Wieck for helping me hammer out the
> scheme above. Mistakes are all mine ;)

I see no negative comments to this patch on -hackers.

This was discussed here
http://wiki.postgresql.org/wiki/PgCon_2008_Developer_Meeting#SQL.2FMED
and I had understood the consensus to be that we would go ahead with
this?

The notes say "Heikki doesn't think this is a long term solution", but
in the following discussion it was the *only* way of doing this that
will work with non-PostgreSQL databases. So it seems like the way we
would want to go, yes?

So, can we add this to the CommitFest July page so it can receive some
substantial constructive/destructive comments?

This could be an important feature in conjunction with Hot Standby.

--
Simon Riggs

www.2ndQuadrant.com

PostgreSQL Training, Services and Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-es-ayuda] Migrar una base de LATIN1 a UTF

Germán C. Basisty escribió:

> Tengo un cliente que desde hace un tiempo corre una base de datos LATIN1 en
> PostgreSQL 8.0, y deseo migrar el motor a 8.3, pero cambiando el encodig de
> dicha base a UTF.
>
> Cuál sería la forma elegante de realizar esta maniobra? Backup de la base y
> luego restore no funciona bien, muchos símbolos del idioma español son
> interpretados de forma incorrecta.

Haz el pg_dump normalmente (en Latin1), y luego

PGOPTIONS="-c client_encoding=latin1" pg_restore el-dump-de-tu-base.dump

Creo que eso debería funcionar.

(Obviamente, al hacer initdb en 8.3 se debió haber escogido una
configuración regional en utf8)

--
Alvaro Herrera

http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
--
TIP 2: puedes desuscribirte de todas las listas simultáneamente
(envía "unregister TuDirecciónDeCorreo" a majordomo@postgresql.org)

Re: [pgsql-es-ayuda] Migrar una base de LATIN1 a UTF

Germán C. Basisty wrote:
>
> Estimados colegas:
>
>
>
> Tengo un cliente que desde hace un tiempo corre una base de datos
> LATIN1 en PostgreSQL 8.0, y deseo migrar el motor a 8.3, pero
> cambiando el encodig de dicha base a UTF.
>
> Cuál sería la forma elegante de realizar esta maniobra? Backup de la
> base y luego restore no funciona bien, muchos símbolos del idioma
> español son interpretados de forma incorrecta.
>
> Saludos,
>
>
>
> *Germán C. Basisty*
>
> *Estudio Informático Patagónico*
>
> *Consultor - Tecnología Informática*
>
> *_german.basisty@eipsistemas.ath.cx <noelia.borda@eipsistemas.ath.cx>_*
>
> *_http://www.eipsistemas.ath.cx <http://www.eipsistemas.ath.cx/>_*
>
>
>
Usa set client_encoding en la terminal y haces la importacion, asi
postgresql se encarga de hacer la conversion hacia UTF8.

--
Saludos,

Julio Cesar Sánchez González.

--
Ahora me he convertido en la muerte, destructora de mundos.
Soy la Muerte que se lleva todo, la fuente de las cosas que vendran.

www.sistemasyconectividad.com.mx

http://darkavngr.blogspot.com/

--
TIP 7: no olvides aumentar la configuración del "free space map"

[HACKERS] 8.1 index corruption woes

#!/usr/bin/perl

use warnings;
use strict;

my $PG_FILEDUMP = "pg_filedump-8.1";

process_table();
exit();

sub process_table {
my ($table, @indexes) = read_toc();

foreach my $elem ($table, @indexes) {
die "file $elem does not exist" unless -f "$elem";
if (!-f "$elem.dump") {
print STDERR "generating pg_filedump for $elem\n";
`$PG_FILEDUMP -i $elem > $elem.dump`;
}
}

print STDERR "loading unused line pointers for table $table\n";
my $unused = get_heap_unused($table);

foreach my $index (@indexes) {
print STDERR "processing index $index\n";
process_index($unused, $index);
}
}

# Reads a "toc" file, which is a description of a table and its indexes. A
# table line is "table: xxx" where xxx is the relfilenode of the table, and an
# index line is "index: xxx" where xxx is the relfilenode of the index.
sub read_toc {
my $table;
my @indexes;

print STDERR "now expecting a TOC in stdin...\n";
while (<>) {
if (/^table: ([0-9]+)$/) {
$table = $1;
}
if (/^index: ([0-9]+)$/) {
push @indexes, $1;
}
}
print STDERR "finished reading the TOC for table $table\n";
print STDERR ("indexes: ", join(", ", @indexes), "\n");
return $table, @indexes;
}

# Reads a pg_filedump -i report for the given table, and fetches into a hash
# all the line pointers that are marked "unused". Returns the hash so built.
sub get_heap_unused {
my $table = shift;
my $curblock;
my $unused = {};

my $infile = "$table.dump";
open IN, "<", "$infile" or die "can't open $infile: $!";
while (<IN>) {
if (/^Block\s+([0-9]+) /) {
$curblock = $1;
}

if (/^ Item\s+([0-9]+) -- Length:.*Flags: 0x00/) {
push @{$unused->{$curblock}}, $1;
}
}

return $unused;
}

# Given an index and a hash built by get_heap_unused, reads the pg_filedump -i
# report for that index and prints a list of all index pointers that point to
# any element in the hash.
sub process_index {
my $unused = shift;
my $index = shift;

my $curblock;
my $special = 0;
my $isleaf = 0;
my $collect = 0;
my @lines = ();

my $infile = "$index.dump";

open IN, "<", "$infile" or die "can't open $infile: $!";
while (<IN>) {
if (/^Block\s+([0-9]+) /) {
$curblock = $1;
}
if (/^<Data> --/) {
$collect = 1;
next;
}

if (/^<Special Section> -----$/) {
$special = 1;
next;
}

if ($collect) {
push @lines, $_;
}

if ($special) {
if (/^ Flags.*LEAF/) {
$isleaf = 1;
}
}

if (/^$/) {
if ($special) {
if ($isleaf) {
report_broken_block($unused,
$index, $curblock, @lines);
}
$isleaf = 0;
$special = 0;
@lines = ();
}
}
}
}

# workhorse for process_index; gets an array of lines comprising an index
# block Data entries, and prints those that match the unused hash.
sub report_broken_block {
my $unused = shift;
my $indexid = shift;
my $block = shift;
my $item;

foreach my $line (@_) {
if ($line =~ /^ Item\s+([0-9]+)/) {
$item = $1;
}
if ($line =~ /^ Block Id:\s+([0-9]+)\s+linp Index: ([0-9]+)/) {
my $blk = $1;
my $lp = $2;

next unless defined $unused->{$blk};
foreach my $hlp (@{$unused->{$blk}})
{
next unless $lp eq $hlp;

print "INDEX PTR TO UNUSED HEAP: ".
"index $indexid ($block,$item) -> ($blk, $lp)\n";
}
}
}
}
Hi,

We've detected what I think is some sort of index corruption in 8.1.
The server is running 8.1.11, so AFAICT the problem with truncated pages
in vacuum is already patched and accounted for (i.e. we reindexed, and a
bit later the problem presented itself again). There haven't been any
relevant fixes after that AFAICT.

What we see is that after a bit of updating the index, it starts having
tuples that poing to heap entries which are marked unused.

I detected one of these almost by accident, and then built a tool to
discover them by examining pg_filedump output. Basically what it does
is scan the heap, note which heap tuples are marked "unused", and then
scan the index and for each index tuple in leaf pages, see if it points
to an unused heap tuple. The number of occurrences is amazingly high.
Right after a reindex there isn't any occurrence; but after a while of
application load, a lot of them appear.

I catalogued this as index corruption: the theory is that as soon as the
unused heap tuple is reused, the index will have a pointer with the
wrong index key pointing to a live heap tuple.

(We saw an occurrence of this too, but I wasn't motivated enough to
write a tool to verify the data in index vs. heap tuples.)

However, seeing the high prevalence of the problem, I started
considering whether the tool is misreading the output anyway -- i.e.
maybe it's a bug in the tool, or a misconception on my part.

The tool output looks like this:

INDEX PTR TO UNUSED HEAP: index 273375 (78,18) -> (5530, 17)
INDEX PTR TO UNUSED HEAP: index 273375 (96,84) -> (5436, 3)
INDEX PTR TO UNUSED HEAP: index 273375 (111,1) -> (1317, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (145,1) -> (1665, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (174,1) -> (2656, 17)
INDEX PTR TO UNUSED HEAP: index 273375 (199,1) -> (1745, 21)
INDEX PTR TO UNUSED HEAP: index 273375 (207,1) -> (358, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (214,1) -> (2800, 17)
(many more lines of the same stuff)

What this means (first line) is that on index 273375, page 78, offset
18, there's a pointer to heap page 5530, offset 17; but that heap offset
shows up as unused. This is from the heap:

Block 5530 ********************************************************
<Header> -----
Block Offset: 0x02b34000 Offsets: Lower 136 (0x0088)
Block: Size 8192 Version 3 Upper 992 (0x03e0)
LSN: logid 130 recoff 0xd53b3090 Special 8192 (0x2000)
Items: 29 Free Space: 856
Length (including item array): 140

[ ... snip other items ...]

Item 17 -- Length: 0 Offset: 504 (0x01f8) Flags: 0x00

This is from the index:

Block 78 ********************************************************
<Header> -----
Block Offset: 0x0009c000 Offsets: Lower 788 (0x0314)
Block: Size 8192 Version 3 Upper 5104 (0x13f0)
LSN: logid 131 recoff 0x01b3c6e0 Special 8176 (0x1ff0)
Items: 192 Free Space: 4316
Length (including item array): 792

[ ... snip other items ... ]

Item 18 -- Length: 16 Offset: 5184 (0x1440) Flags: USED
Block Id: 5530 linp Index: 17 Size: 16
Has Nulls: 0 Has Varwidths: 0

Of course, we're very concerned about this problem. Personally I am
prepared to believe that this could be a hardware problem, because no
one else seems to be reporting this kind of thing, but I don't see how
it could cause this particular issue and not more wide-ranging data
corruption.

I would like comments on whether this is really a problem or I am just
misreading pg_filedump output. If we confirm this, we can try to
proceed to investigate it further.

I attach the checker tool in case it is of any interest.

Thanks,

--
Alvaro Herrera

http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [SQL] how to control the execution plan ?

On Mon, Jul 7, 2008 at 3:14 AM, Sabin Coanda <sabin.coanda@deuromedia.ro> wrote:
> Hi there,
>
> I try to execute the following statement:
>
> SELECT *
> FROM (
> SELECT "MY_FUNCTION_A"(bp."COL_A", NULL::boolean) AS ALIAS_A
> FROM "TABLE_A" bp
> JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
> JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
> WHERE pn."Editor"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL AND
> bp."COL_A"::text <> ''::text
> ) x
> WHERE (x.ALIAS_A::text ) IS NULL;
>
> The problem is the excution plan first make Seq Scan on "TABLE_A", with
> Filter: (("COL_A" IS NOT NULL) AND (("COL_A")::text <> ''::text) AND
> (("MY_FUNCTION_A"("COL_A", NULL::boolean))::text IS NULL))". This way,
> MY_FUNCTION_A crashes for some unsupported data provided by "COL_A".
>
> I'd like to get an execution plan which is filtering first the desired rows,
> and just after that compute te column value "MY_FUNCTION_A"(bp."COL_A",
> NULL::boolean).

Just wondering what the query plans look like here, both regular
explain, and if you can wait for it to execute, explain analyze.

I'm guessing that the function is not indexed / indexable. Is it
marked immutable (and is it actually immutable) or stable (and is
stable)?

If it's immutable then you can create an index on it and that should
speed things up.

>
> I made different combinations, including a subquery like:
>
> SELECT *
> FROM (
> SELECT "MY_FUNCTION_A"(y."COL_A", NULL::boolean) AS ALIAS_A
> FROM (
> SELECT bp."COL_A"
> FROM "TABLE_A" bp
> JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
> JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
> WHERE pn."COL_E"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL
> AND bp."COL_A"::text <> ''::text
> ) y
> ) x
> WHERE (x.ALIAS_A::text ) IS NULL;
>
> but postgres analyze is too 'smart' and optimize it as in the previous case,
> with the same Seq Scan on "TABLE_A", and with the same filter.
>
> I thought to change the function MY_FUNCTION_A, to support any argument
> data, but the even that another performance problem will be rised when the
> function will be computed for any row in join, even those that can be
> removed by other filter.
>
> Do you have a solution please ?

If it's still to smart, you can run two queries, one to pull the set
you want to work with from the custom function into a temp table, then
analyze it, then run the query against that.
Not an optimal solution, but it might be the fastest if you can't
index your function.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [HACKERS] the un-vacuumable table

On Mon, Jul 7, 2008 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Andrew Hammond" <andrew.george.hammond@gmail.com> writes:
>>> On Thu, Jul 3, 2008 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Have you looked into the machine's kernel log to see if there is any
>>>> evidence of low-level distress (hardware or filesystem level)?
>
>> Jun 19 03:06:14 db1 kernel: mpt1: attempting to abort req
>> 0xffffffff929b9f88:6812 function 0
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): WRITE(16). CDB: 8a 0 0 0
>> 0 1 6c 99 9 c0 0 0 0 20 0 0
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): CAM Status: SCSI Status Error
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): SCSI Status: Check Condition
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): UNIT ATTENTION asc:29,0
>> Jun 19 03:06:14 db1 kernel: (da1:mpt1:0:0:0): Power on, reset, or bus
>> device reset occurred
>> [etc]
>
>> I think this is a smoking gun.
>
> Yeah, sure looks like one. Time to replace that disk drive?

Well, that's an Apple XRaid device on the other side of an LSI
FiberChannel HBA. I'll see if it has any error messages in it's logs.

> Also, I suggest filing a bug with your kernel distributor --- ENOSPC was
> a totally misleading error code here. Seems like EIO would be more
> appropriate. They'll probably want to see the kernel log.

It's FreeBSD 6.2 (yes, I know it's EoL). I'll submit a bug including this email.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[PERFORM] Practical upper limits of pgbench read/write tps with 8.3

I'm spending a third day testing with the ioDrive, and it occurred to
me that I should normalize my tests by mounting the database on a
ramdisk. The results were surprisingly low. On the single 2.2GHz
Athlon, the maximum tps seems to be 1450. This is achieved with a
single connection. I/O rates to and from the ramdisk never exceed
50MB/s on a one-minute average.

With the flash device on the same benchmark, the tps rate is 1350,
meaning that as far as PostgreSQL is concerned, on this machine, the
flash device achieves 90% of the best possible performance.

Question being: what's the bottleneck? Is PG lock-bound?

-jwb

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] PATCH: CITEXT 2.0

On Jul 7, 2008, at 13:59, Gregory Stark wrote:

> Of course the obvious case of two equivalent strings with different
> bytes
> would be two strings which differ only in case in a collation which
> doesn't
> distinguish based on case. So you obviously can't take this route
> for citext.

Well, to be fair, citext isn't imposing a collation. It's just calling
str_tolower() on strings before passing them on to varstr_cmp() or
strncmp() to compare.

> I don't think you have to worry about the problem that cause
> Postgres to make
> this change. IIRC it was someone comparing strings like paths and
> usernames
> and getting false positives because they were in a Turkish locale
> which found
> certain sequences of characters to be insignificant for ordering.
> Someone
> who's using a citext data type has obviously decided that's
> precisely the kind
> of behaviour they want.

Hrm. So in your opinion, strncmp() could be used for all comparisons
by citext, rather than varstr_cmp()?

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[pdxpug] Sign up for PDXPUG PgDay! July 20, 2008

If you have trouble viewing or submitting this form, you can fill it out online:
http://spreadsheets.google.com/viewform?key=paoTJ9uEi8vIqumUzwzrCAw&email=true

Sign up for PDXPUG PgDay! July 20, 2008

Please sign up for free admission to the after-party at the Gotham Tavern, and to reserve a tshirt.  We are requesting a $20 donation that will be given to Software in the Public Interest for attendance.  You can pay with check or cash when you arrive at the conference.

Re: [HACKERS] PATCH: CITEXT 2.0

"David E. Wheeler" <david@kineticode.com> writes:

> On Jul 7, 2008, at 12:21, David E. Wheeler wrote:
>
>> My question is: why? Shouldn't they all use the same function for
>> comparison? I'm happy to dupe this implementation for citext, but I don't
>> understand it. Should not all comparisons be executed consistently?
>
> Let me try to answer my own question by citing this comment:
>
> /*
> * Since we only care about equality or not-equality, we can avoid all
> the
> * expense of strcoll() here, and just do bitwise comparison.
> */
>
> So, the upshot is that the = and <> operators are not locale-aware, yes? They
> just do byte comparisons. Is that really the way it should be? I mean, could
> there not be strings that are equivalent but have different bytes?

There could be strings that strcoll returns 0 for even though they're not
identical. However that caused problems in Postgres so we decided that only
equal strings should actually compare equal. So if strcoll returns 0 then we
do a bytewise comparison to impose an arbitrary ordering.

Of course the obvious case of two equivalent strings with different bytes
would be two strings which differ only in case in a collation which doesn't
distinguish based on case. So you obviously can't take this route for citext.

I don't think you have to worry about the problem that cause Postgres to make
this change. IIRC it was someone comparing strings like paths and usernames
and getting false positives because they were in a Turkish locale which found
certain sequences of characters to be insignificant for ordering. Someone
who's using a citext data type has obviously decided that's precisely the kind
of behaviour they want.

--
Gregory Stark
EnterpriseDB

http://www.enterprisedb.com

Ask me about EnterpriseDB's RemoteDBA services!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [pgsql-es-ayuda] como definir tablas para que las consultas sean rapidas ?

Es por una cuestión de performance. NO dije que la normalización
tuviera que ver con los tipos de datos pero, obviamente sería poco
performante utilizar PK de char....
disculpas si se entendió mal-...

--
/*
## Emanuel CALVO FRANCO. ##
~-~- 3m4nuek ~-~-
*/
--
TIP 2: puedes desuscribirte de todas las listas simultáneamente
(envía "unregister TuDirecciónDeCorreo" a majordomo@postgresql.org)

Re: [GENERAL] To store and retrive image data in postgresql

On Jul 7, 2008, at 12:22 PM, aravind chandu wrote:
> I need to store an image in postgresql database and after
> that i need to retrive the image back.Can you please help me how to
> do this?

Assuming you mean an image as in a binary visual image (like a JPEG),
the data type you want is BYTEA. How to actually get BYTEA data into
and out of PostgreSQL depends on what client library and language you
are using.

If your application allows it, it is often far more efficient to
store a reference to the image in the database (such as a URI or file
path), rather than the image itself.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[COMMITTERS] pgsql: Fix estimate_num_groups() to assume that GROUP BY expressions

Log Message:
-----------
Fix estimate_num_groups() to assume that GROUP BY expressions yielding boolean
results always contribute two groups, regardless of the expression contents.
This is very substantially more accurate than the regular heuristic for
certain boolean tests like "col IS NULL". Per gripe from Sam Mason.