Hello all,
I have been digging into the database page layout (specifically the tuples) to ensure the unsigned integer types were consuming the proper storage.
While digging around, I found one thing surprising:
It appears the heap tuples are padded at the end to the MAXALIGN distance.
Below is my data that I used to come to this conclusion.
(This test was performed on a 64-bit system with --with-blocksize=32).
The goal was to compare data from comparable type sizes.
The first column indicates the type (char, uint1, int2, uint2, int4, and uint4),
the number in () indicates the number of columns in the table.
The Length is from the .lp_off field in the ItemId structure.
The Offset is from the .lp_len field in the ItemId structure.
The Size is the offset difference.
char (1) Length Offset Size char (9) Length Offset Size
25 32736 32 33 32728 40
25 32704 32 33 32688 40
25 32672 32 33 32648 40
25 32640 33 32608
uint1 (1) Length Offset Size uint1 (9) Length Offset Size
25 32736 32 33 32728 40
25 32704 32 33 32688 40
25 32672 32 33 32648 40
25 32640 33 32608
int2 (1) Length Offset Size int2 (5) Length Offset Size
26 32736 32 34 32728 40
26 32704 32 34 32688 40
26 32672 32 34 32648 40
26 32640 34 32608
uint2 (1) Length Offset Size unt2 (5) Length Offset Size
26 32736 32 34 32728 40
26 32704 32 34 32688 40
26 32672 32 34 32648 40
26 32640 34 32608
int4 (1) Length Offset Size int4 (3) Length Offset Size
28 32736 32 36 32728 40
28 32704 32 36 32688 40
28 32672 32 36 32648 40
28 32640 36 32608
uint4 (1) Length Offset Size uint4 (3) Length Offset Size
28 32736 32 36 32728 40
28 32704 32 36 32688 40
28 32672 32 36 32648 40
28 32640 36 32608
From the documentation at: http://www.postgresql.org/docs/8.3/static/storage-page-layout.html
and from the comments in src/include/access/htup.h I understand the user data (indicated by t_hoff)
must by a multiple of MAXALIGN distance, but I did not find anything suggesting the heap tuple itself
had this requirement.
After a cursory glance at the HeapTupleHeaderData structure, it appears it could be aligned with
INTALIGN instead of MAXALIGN. The one structure I was worried about was the 6 byte t_ctid
structure. The comments in src/include/storage/itemptr.h file indicate the ItemPointerData structure
is composed of 3 int16 fields. So everthing in the HeapTupleHeaderData structure is 32-bits or less.
I am interested in attempting to generate a patch if this idea appears feasible. The current data
set I am playing with it would save over 3GB of disk space. (Back of the envelope calculations
indicate that 5% of my current storage is consumed by this padding. My tuple length is 44 bytes.)
Thanks,
- Ryan
I have been digging into the database page layout (specifically the tuples) to ensure the unsigned integer types were consuming the proper storage.
While digging around, I found one thing surprising:
It appears the heap tuples are padded at the end to the MAXALIGN distance.
Below is my data that I used to come to this conclusion.
(This test was performed on a 64-bit system with --with-blocksize=32).
The goal was to compare data from comparable type sizes.
The first column indicates the type (char, uint1, int2, uint2, int4, and uint4),
the number in () indicates the number of columns in the table.
The Length is from the .lp_off field in the ItemId structure.
The Offset is from the .lp_len field in the ItemId structure.
The Size is the offset difference.
char (1) Length Offset Size char (9) Length Offset Size
25 32736 32 33 32728 40
25 32704 32 33 32688 40
25 32672 32 33 32648 40
25 32640 33 32608
uint1 (1) Length Offset Size uint1 (9) Length Offset Size
25 32736 32 33 32728 40
25 32704 32 33 32688 40
25 32672 32 33 32648 40
25 32640 33 32608
int2 (1) Length Offset Size int2 (5) Length Offset Size
26 32736 32 34 32728 40
26 32704 32 34 32688 40
26 32672 32 34 32648 40
26 32640 34 32608
uint2 (1) Length Offset Size unt2 (5) Length Offset Size
26 32736 32 34 32728 40
26 32704 32 34 32688 40
26 32672 32 34 32648 40
26 32640 34 32608
int4 (1) Length Offset Size int4 (3) Length Offset Size
28 32736 32 36 32728 40
28 32704 32 36 32688 40
28 32672 32 36 32648 40
28 32640 36 32608
uint4 (1) Length Offset Size uint4 (3) Length Offset Size
28 32736 32 36 32728 40
28 32704 32 36 32688 40
28 32672 32 36 32648 40
28 32640 36 32608
From the documentation at: http://www.postgresql.org/docs/8.3/static/storage-page-layout.html
and from the comments in src/include/access/htup.h I understand the user data (indicated by t_hoff)
must by a multiple of MAXALIGN distance, but I did not find anything suggesting the heap tuple itself
had this requirement.
After a cursory glance at the HeapTupleHeaderData structure, it appears it could be aligned with
INTALIGN instead of MAXALIGN. The one structure I was worried about was the 6 byte t_ctid
structure. The comments in src/include/storage/itemptr.h file indicate the ItemPointerData structure
is composed of 3 int16 fields. So everthing in the HeapTupleHeaderData structure is 32-bits or less.
I am interested in attempting to generate a patch if this idea appears feasible. The current data
set I am playing with it would save over 3GB of disk space. (Back of the envelope calculations
indicate that 5% of my current storage is consumed by this padding. My tuple length is 44 bytes.)
Thanks,
- Ryan
No comments:
Post a Comment