Friday, July 18, 2008

Re: [HACKERS] phrase search

I looked at query operators for tsquery and here are some of the new
query operators for position based queries. I am just proposing some
changes and the questions I have.

1. What is the meaning of such a query operator?

foo #5 bar -> true if the document has word "foo" followed by "bar" at
5th position.

foo #<5 bar -> true if document has word "foo" followed by "bar" with in
5 positions

foo #>5 bar -> true if document has word "foo" followed by "bar" after 5
positions

then some other ways it can be used are
!(foo #<5 bar) -> true if document never has any "foo" followed by bar
with in 5 positions.

etc .....

2. How to implement such query operators?

Should we modify QueryItem to include additional distance information or
is there any other way to accomplish it?

Is the following list sufficient to accomplish this?
a. Modify to_tsquery
b. Modify TS_execute in tsvector_op.c to check new operator

Is there anything needed in rewrite subsystem?

3. Are these valid uses of the operators and if yes what would they
mean?

foo #5 (bar & cup)

If no then should the operator be applied to only two QI_VAL's?

4. If the operator only applies to two query items can we create an
index such that (foo, bar)-> documents[min distance, max distance]
How difficult it is to implement an index like this?


Thanks,
-Sushant.

On Thu, 2008-06-05 at 19:37 +0400, Teodor Sigaev wrote:
> > I can add index support and support for arbitrary distance between
> > lexeme.
> > It appears to me that supporting arbitrary boolean expression will be
> > complicated. Can we pull out something from TSQuery?
>
> I don't very like an idea to have separated interface for phrase search. Your
> patch may be a module and used by people who really wants to have a phrase search.
>
> Introducing new operator in tsquery allows to use already existing
> infrastructure of tsquery such as concatenations (&&, ||, !!), rewrite subsystem
> etc. But new operation/types specially designed for phrase search makes needing
> to make that work again.
>


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] 10.5 OS X ppc64 problem

Shane Ambler <pgsql@Sheeky.Biz> writes:
> There has been some talk about getting postgres to build as a universal
> binary. The current makefiles don't support the option but a couple of
> people have come up with work arounds. I do believe that there are plans
> to add this to future releases.

FYI, we are making some progress:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00884.php

I am not sure we'll ever bother to fix the configuration-file issue,
but other than that it does seem to be feasible to build universal
binaries without major hacking on the source code as of CVS HEAD.
(I fixed the postgres.bki issue a few minutes ago.)

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[COMMITTERS] pgsql: Avoid substituting NAMEDATALEN, FLOAT4PASSBYVAL, and

Log Message:
-----------
Avoid substituting NAMEDATALEN, FLOAT4PASSBYVAL, and FLOAT8PASSBYVAL into
the postgres.bki file during build, because we want that file to be entirely
platform- and configuration-independent; else it can't safely be put into
/usr/share on multiarch machines. We can do the substitution during initdb,
instead. FLOAT4PASSBYVAL and FLOAT8PASSBYVAL are new breakage as of 8.4,
while the NAMEDATALEN hazard has been there all along but I guess no one
tripped over it. Noticed while trying to build "universal" OS X binaries.

Modified Files:
--------------
pgsql/src/backend/catalog:
genbki.sh (r1.44 -> r1.45)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/catalog/genbki.sh?r1=1.44&r2=1.45)
pgsql/src/bin/initdb:
initdb.c (r1.157 -> r1.158)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/bin/initdb/initdb.c?r1=1.157&r2=1.158)
pgsql/src/tools/msvc:
Genbki.pm (r1.4 -> r1.5)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/tools/msvc/Genbki.pm?r1=1.4&r2=1.5)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

[HACKERS] Getting to universal binaries for Darwin

Awhile back we determined that the big stumbling block for building
Postgres universal binaries for OS X was that we were using "ld"
to produce intermediate SUBSYS.o files, and it didn't want to preserve
multi-architecture components of input .o files. Peter got rid of that
hack recently, so I thought I'd see what it takes to build a universal
binary with CVS HEAD.

The good news is that the tools problem seems to be solved. If you
add something like "-arch i386 -arch ppc" to CFLAGS and build normally,
you get real working multiarch binaries and libraries. (At least it
works for me on OS X 10.5.4 --- no promises about older toolchains.)

The bad news is that if you only do that, only the arch that you
actually build on will work. We have configure set up to insert
various hardware-dependent definitions into pg_config.h and
ecpg_config.h, and if you don't have the right values visible for
each compilation, the resulting executables will fail.

You can get around that by hacking up the generated config files
with #ifdef __i386__ and so on to expose the correct values of
the hardware-dependent symbols to each build. Of course you have
to know what the correct values are --- if you don't have a sample
of each architecture handy to run configure against, it'd be easy
to miss some things. And even then it's pretty tedious. I am
not sure if it is possible or worth the trouble to try to automate
this part better.

The other big problem is that genbki.sh wants to propagate some of those
hardware-dependent symbols into postgres.bki, in particular
FLOAT4PASSBYVAL and FLOAT8PASSBYVAL. This is a complete nonstarter,
because we put postgres.bki under share/ which means it is supposed to
be architecture independent. (I'm really glad I realized this before we
released 8.4, because it'd violate Red Hat's multilib file rules...)
I think we can pretty easily fix that by moving the responsibility
for substituting these values into initdb, though.

Modulo the above problems, I was able to build i386+ppc binaries that
do in fact work on both architectures. I haven't got any 64-bit Apple
machines to play with, so there might be 64-bit issues I missed.
Still, this is a huge step forward compared to what was discussed here:
http://archives.postgresql.org/pgsql-general/2008-02/msg00200.php

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] temp table problem

I can't help suspecting that the two statements in question were run
in different sessions (or at least different transactions?).

...Robert

On Fri, Jul 18, 2008 at 9:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Abbas" <abbas.butt@enterprisedb.com> writes:
>> I have come across a problem. When you try to access a temp table
>> created via SPI_EXEC, you get a table not found error.
>
>> SPI_EXEC("CREATE TEMP TABLE my_temp_table(first_name text, last_name
>> text)", UTILITY);
>> SPI_EXEC("REVOKE ALL ON TABLE my_temp_table FROM PUBLIC", UTILITY);
>
>> The second statement generates a table not found error, although the
>> first statement was successful.
>
> Works for me ...
>
> ret = SPI_execute("CREATE TEMP TABLE my_temp_table(first_name text, last_name text)", false, 0);
>
> if (ret != SPI_OK_UTILITY)
> elog(ERROR, "SPI_execute(CREATE) returned %d", ret);
>
> ret = SPI_execute("REVOKE ALL ON TABLE my_temp_table FROM PUBLIC", false, 0);
>
> if (ret != SPI_OK_UTILITY)
> elog(ERROR, "SPI_execute(REVOKE) returned %d", ret);
>
> What PG version are you testing? Maybe you need to show a complete
> test case, instead of leaving us to guess at details?
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PERFORM] An "obvious" index not being used

Daniele Varrazzo writes:

> I suspect the foo.account_id statistical data are not used at all in query:
> the query planner can only estimate the number of accounts to look for, not

You mentioned you bumped your default_statistics_target.
What did you increase it to?
My data sets are so "strange" that anything less than 350 gives many bad
plans.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] Load spikes on 8.1.11

On Fri, Jul 18, 2008 at 7:15 PM, David Fetter <david@fetter.org> wrote:
On Fri, Jul 18, 2008 at 10:15:42AM +0530, Gurjeet Singh wrote:
> On Fri, Jul 18, 2008 at 10:05 AM, Gurjeet Singh <singh.gurjeet@gmail.com>
> wrote:
>
> > On Fri, Jul 18, 2008 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> >> "Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
> >> >     During these spikes, in the 'top' sessions we see the 'idle' PG
> >> > processes consuming between 2 and 5 % CPU, and since the box has 8 CPUS
> >> (2
> >> > sockets and each CPU is a quad core Intel Xeon processors) and somewhere
> >> > around 200 Postgres processes, the load spikes to above 200; and it does
> >> > this very sharply.
> >>
> >> This looks like heavy contention for a spinlock.  You need to get a
> >> higher-level analysis of what's happening before anyone can say much
> >> more than that.
> >>
> >> Note that 8.1 is pretty much ancient history as far as scalability to
> >> 8-core hardware goes.  You should probably consider updating to 8.3
> >> before investing too much time in tracking down what's happening.
> >> If you can still show the problem on 8.3 then there would be some
> >> interest in fixing it ...
> >
> >
> > Upgrading is on the cards, but not as high priority as I would like it to
> > be! This is a production box, and we desperatly need some respite from these
> > spikes.
> >
> > Can you please elaborate on what high level diagnosis would you need?
> >
> > I just ran DROP SCHEMA _<slony schema> CASCADE; and it spiked again, on a
> > very low loaded box!!
> >
> > Thanks for all you help.
> >
>
> Would reducing the number of connections on the DB help in reducing the
> spike?

Just generally, reducing the number of connections to the DB will help
in reducing resource consumption.

Will try this option, at least in the next schema upgrade or when setting up Slony.



When you first get a chance, use or set up a test environment where
you can test the upgrade to 8.3.latest.

Based on the thread above, we seem to be moving towards greater consensus on upgrade. One of the major hurdles in our environment's upgrade is the loss of implicit casts in 8.3.

Following is the environment we have:

select version();
                                                  version
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 8.1.11 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9)
(1 row)

I cannot see oprofile installed on this box, so will try to get that installed and get you guys some more details when this happens next.

Thanks,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Re: [PERFORM] 3ware vs Areca

Jeffrey Baker writes:

> Their firmware is, frankly, garbage. In more than one instance we
> have had the card panic when a disk fails, which is obviously counter
> to the entire purpose of a RAID.

I have had simmilar problems with 3ware 9550 and 9650 cards.
Undre FreeBSD I have seen constant crashes under heavy loads.
Used to think it was just FreeBSD, but saw a thread on StorageReview where
the same was happening under Linux.

> controllers from our database server and replaced them with HP P800s.

How is that working out?
Which RAID level? SAS/SATA?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[pgsql-www] How to contribute to site?

I am looking at https://pgweb.postgresql.org and don't see any pointers on
how one contributes to the pg doc project.

Also checked http://wiki.postgresql.org/wiki/Developer_FAQ and the wiki in
general.

Any URLs or any pointers on how to contribute?
I particular I want to provide examples for this page:
http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html

I spent the better part of an afternoon figuring out how to make a dynamic
function to send data to the proper partition.

I recall there was a presentation in a PDF that had a sample, but could not
find it anymore. Neither could find any examples.

Figure I could share what I have so others have a starting point when doing
partitions.

--
Sent via pgsql-www mailing list (pgsql-www@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-www

Re: [GENERAL] UPDATE runs slow in a transaction

Viktor Rosenfeld <rosenfel@informatik.hu-berlin.de> writes:
> Postgres is indeed selecting a bad plan. Turns out that the index I
> created to speed up the UPDATE isn't used inside a transaction block.

That doesn't make any sense to me, and in fact I cannot replicate any
such behavior here. What PG version are you running, exactly?

The exact test case I tried is attached --- it's just your original
incomplete example with some dummy data created beforehand. I get
a plan using the tmp_id index in all supported PG versions.

regards, tom lane


drop table _struct, _rank;
create table _struct(token_index int, id int);
create table _rank(struct_ref int, pre int, post int);

insert into _struct select i, i from generate_series(1,1000) g(i);
insert into _rank select i, i, i from generate_series(1,1000) g(i);
analyze _struct;
analyze _rank;

begin;

-- add columns left_token, right_token and copy values from token_index
ALTER TABLE _struct ADD left_token integer;
ALTER TABLE _struct ADD right_token integer;
UPDATE _struct SET left_token = token_index;
UPDATE _struct SET right_token = token_index;

-- set left, right values for non-terminals
-- (use temporary table to get rid of joins between struct and rank)
CREATE TABLE tmp AS
SELECT r.pre, r.post, s.id, s.left_token, s.right_token
FROM _rank r, _struct s
WHERE r.struct_ref = s.id;

CREATE INDEX idx_tmp_pre_post ON tmp (pre, post);

UPDATE tmp SET left_token = (SELECT min(t2.left_token) FROM tmp t2 WHERE t2.pre >= tmp.pre AND t2.pre <= tmp.post);
UPDATE tmp SET right_token = (SELECT max(t2.right_token) FROM tmp t2 WHERE t2.pre >= tmp.pre AND t2.pre <= tmp.post);

-- copy left, right values for everything
CREATE INDEX tmp_id ON tmp (id);
--analyze tmp;
explain UPDATE _struct SET left_token = (SELECT DISTINCT left_token FROM tmp WHERE _struct.id = tmp.id);
-- the UPDATE above takes ages when called within a transaction
UPDATE _struct SET right_token = (SELECT DISTINCT right_token FROM tmp WHERE _struct.id = tmp.id);

-- clean up
DROP TABLE tmp;

rollback;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [HACKERS] temp table problem

"Abbas" <abbas.butt@enterprisedb.com> writes:
> I have come across a problem. When you try to access a temp table
> created via SPI_EXEC, you get a table not found error.

> SPI_EXEC("CREATE TEMP TABLE my_temp_table(first_name text, last_name
> text)", UTILITY);
> SPI_EXEC("REVOKE ALL ON TABLE my_temp_table FROM PUBLIC", UTILITY);

> The second statement generates a table not found error, although the
> first statement was successful.

Works for me ...

ret = SPI_execute("CREATE TEMP TABLE my_temp_table(first_name text, last_name text)", false, 0);

if (ret != SPI_OK_UTILITY)
elog(ERROR, "SPI_execute(CREATE) returned %d", ret);

ret = SPI_execute("REVOKE ALL ON TABLE my_temp_table FROM PUBLIC", false, 0);

if (ret != SPI_OK_UTILITY)
elog(ERROR, "SPI_execute(REVOKE) returned %d", ret);

What PG version are you testing? Maybe you need to show a complete
test case, instead of leaving us to guess at details?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[GENERAL] using regexp_matches and array manipulation

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Can anyone give me an example of how to use regexp_matches and use the
captured values?

For instance, if I have a delimited string "a,b,c" and I want to put
each letter into a variable so I can subsequently use those variables in
an insert statement, how would I do that? I know regexp_matches returns
a text array, but how do I assign the value to an array and then access
those values?

leaselog=# select regexp_matches('a,b,c','(.*),(.*),(.*)');

~ regexp_matches
- ----------------
~ {a,b,c}
(1 row)


I've tried select into, but that just created a table and didn't put the
values into an array variable.
leaselog=# select regexp_matches('a,b,c','(.*),(.*),(.*)') into foo;


Thanks in advance!


Rgds,

Bret
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)

iD8DBQFIgT2T/PgQIGRJuUcRAvMGAJ9VRNfc5ZZsFtS2LG8VJgPNNnL1wwCfewlf
Jih6ReqSTj6Pp9Ya3B2uMn8=
=HbPn
-----END PGP SIGNATURE-----


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [pgsql-es-ayuda] problema con dump de una bd

--- El jue 17-jul-08, Carlos Mendez <lucas1850@gmail.com> escribió:

> De: Carlos Mendez <lucas1850@gmail.com>
> Asunto: [pgsql-es-ayuda] problema con dump de una bd
> A: pgsql-es-ayuda@postgresql.org
> Fecha: jueves, 17 julio, 2008, 12:15 am
> Hola, que tal,
>
> hace poco utilice el phppgadmin para hacer un dump de una
> bd y todo
> correcto, pero hoy no pude hacerlo el phppgadmin solo me
> descargaba archivos
> vacios dum.sql con 0 Kb, bueno entonces trate de usar el
> pg_dump desde la
> linea de comando pero tampoco me dice algo como:
> bash: commad not found
> solo quiero volcar la bd a un archivo.sql para construir
> otra bd, pero no me
> funcionan ni el phpadmin ni el pg_dump, voy a seguir
> investigando
> Gracias de antemano por la ayuda,
> saludos.
>
> PS: Uso FC3, pgsql 8.3.3

Fijate que en la version 8.3.3 el nombre de la bd va despues de los parametros
Ejemplo

pg_dump -U postgres -E UTF8 -c otras opciones.... nombre_de_bd > archivo.sql

-E ("opcional para llevarla al encoding que necesitemos en otro server)

Atte.
Gabriel Colina
-c ("opcional para que haga drop antes de crear la nueva")


____________________________________________________________________________________
Yahoo! MTV Blog & Rock &gt;¡Cuéntanos tu historia, inspira una canción y gánate un viaje a los Premios MTV! Participa aquí http://mtvla.yahoo.com/
--
TIP 2: puedes desuscribirte de todas las listas simultáneamente
(envía "unregister TuDirecciónDeCorreo" a majordomo@postgresql.org)

Re: [JDBC] Problem with java infinity mapping to postgres real column type

"Johan Miller" <johanmiller@gmail.com> writes:
> On Thu, Jul 17, 2008 at 08:22, Kris Jurka <jurka@ejurka.com> wrote:
>> I see this in the 8.3 release notes:
>>
>> Fix float4/float8 to handle Infinity and NAN (Not A Number) consistently
>> (Bruce) The code formerly was not consistent about distinguishing Infinity
>> from overflow conditions.

> What would be the options to handle this on earlier version where it is not
> fixed? Would it be a possibility that this will get changed in the 8.1 and
> 8.2 versions also?

I believe the conclusion was that that change shouldn't be back-patched
because it would be a behavioral change that might break applications.

regards, tom lane

--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: [GENERAL] Backup/Restore of single table in multi TB database

Simon Riggs wrote:
> Have a look at pg_snapclone. It's specifically designed to significantly
> improve dump times for very large objects.
>
> http://pgfoundry.org/projects/snapclone/
>
Also, in case the original poster is not aware, by default pg_dump
allows to backup single tables.
Just add -t <table name>.

Does pg_snapclone works mostly on large rows or will it also be faster
than pg_dump for narrow tables?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[pgsql-es-ayuda] Definir campos xml

  Hola, quiero hacer una aplicación, usando el soprte nativo para XML que trae Postgresql 8.3.3 . La pregunta es: ¿cómo defino los campos que usa XML para que el sistema logre distinguir cuales son: núméricos, textos o de otros tipos, o se guardan todos como texto?

 

  Por ejemplo:

                        

CREATE TABLE test (

id int4 NOT NULL,

xml text,

CONSTRAINT pk PRIMARY KEY (id)

);

 

  Yo sé, por lo que he visto que xml test, puede rebicir varios campos, donde se define el esquema para XML, perdón por la ignorancia :(

 

  Cualquier ayuda será bien recibida.




Enviado desde Correo Yahoo!
La bandeja de entrada más inteligente.

Re: [GENERAL] tsearch dictionary list?

On Fri, 18 Jul 2008, Daniel Chiaramello wrote:

> Hi all.
>
> I'm new to postgresql world, and I have to extend an existing product for
> Thailand - that product has some features based on tsearch, and I was
> wondering if there is an existing dictionary for that language... I failed to
> find any reference of such dictionary on the web, and of course I don't speak
> thailandese at all!

read documentation first and you'll be surprised. In brief,
tsearch supports all dictionaries of OpenOffice
http://wiki.services.openoffice.org/wiki/Dictionaries


>
> In fact, is there somewhere a list of existing tsearch dictionaries? Are
> japanese, chinese, or other "exotic" languages supported by tsearch?
>
> Thanks for your attention,
> Daniel Chiaramello
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Reducing memory usage of insert into select operations?

On 4:55 pm 07/18/08 Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The only thing I can think of is that you had a huge number of rows
> with all the same hash value, so that there wasn't any way to split
> the batch into smaller sections. What are the join keys exactly in
> this query, and what can you tell us about their data distributions?

I can't put actual table or column names so I am putting the actual select
and explain, with all names changed..

insert into customer_transactions
(record_id, date, type, amount, ids, groupid)
select
ca.record_id, coh.date, coh.type, coh.amount, coh.ids, ids.groupid
from
customer_original_historical coh,
cards ca,
customer_ids ids
where
ca.natural_key = coh.natural_key
and ids.ids = coh.ids
and coh.yearmo > '200703';

Hash Join (cost=712213.57..27293913.33 rows=234402352 width=24)
Hash Cond: (coh.id = ids.id)
-> Hash Join (cost=551387.26..18799378.16 rows=234402352
width=22)
Hash Cond: (coh.user_id = ca.user_id)
-> Seq Scan on customer_original_historical coh
(cost=0.00..6702501.40 rows=234402352 width=47)
Filter: (yearmo > '200703'::bpchar)
-> Hash (cost=268355.67..268355.67 rows=14637567 width=32)
-> Seq Scan on cards ca
(cost=0.00..268355.67 rows=14637567 width=32)
-> Hash (cost=77883.25..77883.25 rows=5055525 width=6)
-> Seq Scan on customer_ids ids
(cost=0.00..77883.25 rows=5055525 width=6)

There was a single table, customer_original_historical, which was using a
natural key with a text field.

Most queries used the customer_original_historical by itself or joined
against a single other table which we shoudl call "area".

The new schema I am testing is to split the one single file into 12 files
per month.

In addition I replaced the natural keys with a synthetic integer key.
I also replaced the "area" table with a customer_ids table which only has
two columns: synthetic key for historical and a region.

In order to have 12 tables per month I grouped all the regions into 12
groups. Queries are usually within a single region so what I am trying to
benchmark is if dividing 24 months of data into 24 sets of 12 regions will
perform better than a single large file.

The distribution of the joins is:
There are about 1000,000 unique natural keys. Each natural key has in
average 15 rows per month.
ids are regions where the natural_keys are. Figure 10s of thousands of
natural_keys to an id.

Is that along the lines of what you were looking for?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [ADMIN] answer my question, please!!!

riakdr@gmail.com ("ria ulfa") writes:
> i want to copy table from file with command:
>      COPY log FROM '/tmp/logfile.txt';
> but there is error:
>      ERROR: could not open file "/tmp/logfile.txt" for reading: Permission denied
> what can i do for repair this error??
> thanks for the answer..

It's worth observing that the louder the demands are in the subject
line, the less likely that people are to *want* to volunteer
assistance.

The problem here is that COPY accesses data as whatever Unix user is
running the database, typically "postgres" or "pgsql".

Presumably that user hasn't rights to access the file
"/tmp/logfile.txt". Perhaps that needs to be changed.
--
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules of the Evil Overlord #145. "My dungeon cell decor will not
feature exposed pipes. While they add to the gloomy atmosphere, they
are good conductors of vibrations and a lot of prisoners know Morse
code." <http://www.eviloverlord.com/>

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: [HACKERS] Postgres-R: primary key patches

markus@bluegap.ch (Markus Wanner) writes:
> Hello Chris,
>
> chris wrote:
>> Slony-I does the same, with the "variation" that it permits the option
>> of using a "candidate primary key," namely an index that is unique+NOT
>> NULL.
>>
>> If it is possible to support that broader notion, that might make
>> addition of these sorts of logic more widely useful.
>
> Well, yeah, that's technically not much different, so it would
> probably be very easy to extend Postgres-R to work on any arbitrary
> Index.
>
> But what do we have primary keys for, in the first place? Isn't it
> exactly the *primay* key into the table, which you want to use for
> replication? Or do we need an additional per-table configuration
> option for that? A REPLICATION KEY besides the PRIMARY KEY?

I agree with you that tables are *supposed* to have primary keys;
that's proper design, and if tables are missing them, then something
is definitely broken.

Sometimes, unfortunately, people make errors in design, and we wind up
needing to accomodate situations that are "less than perfect."

The "happy happenstance" is that, in modern versions of PostgreSQL, a
unique index may be added in the background so that this may be
rectified without outage if you can live with a "candidate primary
key" rather than a true PRIMARY KEY.

It seems to me that this extension can cover over a number of "design
sins," which looks like a very kind accomodation where it is surely
preferable to design it in earlier rather than later.

>> I know Jan Wieck has in mind the idea of adding an interface to enable
>> doing highly efficient IUD (Insert/Update/Delete) via generating a way
>> to do direct heap updates, which would be *enormously* more efficient
>> than the present need (in Slony-I, for instance) to parse, plan and
>> execute thousands of IUD statements. For UPDATE/DELETE to work
>> requires utilizing (candidate) primary keys, so there is some
>> seemingly relevant similarity there.
>
> Definitely. The remote backend does exactly that for Postgres-R: it
> takes a change set, which consists of one or more tuple collections,
> and then applies these collections. See ExecProcessCollection() in
> execMain.c.
>
> (Although, I'm still less than thrilled about the internal storage
> format of these tuple collections. That can certainly be improved and
> simplified.)

You may want to have a chat with Jan; he's got some thoughts on a more
general purpose mechanism that would be good for this as well as for
(we think) extremely efficient bulk data loading.
--
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules of the Evil Overlord #145. "My dungeon cell decor will not
feature exposed pipes. While they add to the gloomy atmosphere, they
are good conductors of vibrations and a lot of prisoners know Morse
code." <http://www.eviloverlord.com/>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers