Tuesday, June 10, 2008

Re: [HACKERS] RFD: ALTER COLUMN .. SET STORAGE COMPRESSED;

On Tue, Jun 10, 2008 at 5:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Dawid Kuroczko" <qnex42@gmail.com> writes:
>> As we already have four types of ALTER COLUMN .. SET STORAGE
>> { PLAIN | EXTERNAL | EXTENDED | MAIN } I would like to add
>> "COMPRESSED" which would force column compression (if column is
>> smaller than some minimun, I guess somwehwere between 16 and 32 bytes).
>
> Please see previous discussions about per-column toasting parameters,
> for instance
> http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php
> http://archives.postgresql.org/pgsql-general/2007-08/msg01129.php
>
> I think the general consensus was that we want more flexible access to
> the compression knobs than just another STORAGE setting.

Sounds like a right way to do it. Perhaps the syntax should be something like:

ALTER TABLE tab ALTER COLUMN x WITH (storage_parameter = value, ...);

With storage parameters like:
compress -- enable/disable compression (like PLAIN or EXTERNAL)
min_input_size -- don't compress if smaller than size
min_comp_rate -- leave uncompressed if rate is smaller than
toast -- for out-of-line storage parameters?
compression_algo -- for specifying alternative algorithms if any
(per Alvaro's suggestion).

Perhaps it would be wise to introduce GUCs with default values (as we have now
ALTER COLUMN .. SET STATISTICS and default_statistics_target), named
for example:
default_column_min_input_size (and so on).

ALTER COLUMN .. SET STORAGE ... should be aliases for WITH (...) and be
deprecated I guess.

The HEAP_HASEXTERNAL infomask bit should probably be used to "trigger"
TOASTing code. Perhaps it should be renamed then? I am worried if storage
parameters wouldn't introduce overhead in PostgreSQL's key parts.

...as for compression_algo, perhaps it could be an oid of compression
function(s)
(we need to decompress too). Also we would need to store information which
algo was used to compress the column. Perhaps a byte between varvarlena
herader and actual compressed data (this way we could have multiple algos
simultaneousley).

Speaking of algorithms, I think that e2compr (ext2 filesystem with transparent
compression) could be a nice source of input in this area.

http://e2compr.sourceforge.net/
(Having algos as plugins would allow us to use foreign licenses (gzip) or
event patented algos in countries where software patents are prohibited
without risking anything in core PostgreSQL)

OK, enough for today. Good night.

Regards,
Dawid
--
Solving [site load issues] with [more database replication] is a lot
like solving your own personal problems with heroin - at first it
sorta works, but after a while things just get out of hand.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

No comments: