with a few constants and two joins. All relevant columns should be
indexed, and I'm pretty sure there aren't any type conversion issues.
But the query plan includes a fairly heavy seq scan. The only possible
complication is that the tables involved are fairly large - hundreds of
millions of rows each.
Can anyone explain this? There should only ever be a maximum of about 50
rows returned when the query is executed.
Query:
select fls.function_verified, fls.score, fls.go_category_group_ref,
fs1.gene_ref, fs1.function_verified_exactly, fs2.gene_ref,
fs2.function_verified_exactly from functional_linkage_scores fls,
gene_prediction_view fs1, gene_prediction_view fs2 where fls.gene_ref1 =
fs1.gene_ref and fls.gene_ref2 = fs2.gene_ref and fs1.go_term_ref = 2
and fs2.go_term_ref = 2
Explain on query:
Merge Join (cost=1331863800.16..6629339921.15 rows=352770803726 width=22)
Merge Cond: (fs2.gene_ref = fls.gene_ref2)
-> Index Scan using gene_prediction_view_gene_ref on
gene_prediction_view fs2 (cost=0.00..6235287.98 rows=197899 width=5)
Index Cond: (go_term_ref = 2)
-> Materialize (cost=1331794730.41..1416453931.72 rows=6772736105
width=21)
-> Sort (cost=1331794730.41..1348726570.67 rows=6772736105
width=21)
Sort Key: fls.gene_ref2
-> Merge Join (cost=38762951.04..146537410.33
rows=6772736105 width=21)
Merge Cond: (fs1.gene_ref = fls.gene_ref1)
-> Index Scan using gene_prediction_view_gene_ref
on gene_prediction_view fs1 (cost=0.00..6235287.98 rows=197899 width=5)
Index Cond: (go_term_ref = 2)
-> Materialize (cost=38713921.60..41618494.20
rows=232365808 width=20)
-> Sort (cost=38713921.60..39294836.12
rows=232365808 width=20)
Sort Key: fls.gene_ref1
-> Seq Scan on
functional_linkage_scores fls (cost=0.00..3928457.08 rows=232365808
width=20)
\d on functional_linkage_scores (232241678 rows):
Table "public.functional_linkage_scores"
Column | Type |
Modifiers
-----------------------+---------------+------------------------------------------------------------------------
id | integer | not null default
nextval('functional_linkage_scores_id_seq'::regclass)
gene_ref1 | integer | not null
gene_ref2 | integer | not null
function_verified | boolean | not null
score | numeric(12,4) | not null
go_category_group_ref | integer | not null
go_term_ref | integer |
Indexes:
"functional_linkage_scores_pkey" PRIMARY KEY, btree (id)
"functional_linkage_scores_gene_ref1_key" UNIQUE, btree (gene_ref1,
gene_ref2, go_category_group_ref, go_term_ref)
"ix_functional_linkage_scores_gene_ref2" btree (gene_ref2)
Foreign-key constraints:
"functional_linkage_scores_gene_ref1_fkey" FOREIGN KEY (gene_ref1)
REFERENCES genes(id)
"functional_linkage_scores_gene_ref2_fkey" FOREIGN KEY (gene_ref2)
REFERENCES genes(id)
"functional_linkage_scores_go_category_group_ref_fkey" FOREIGN KEY
(go_category_group_ref) REFERENCES go_category_groups(id)
\d on gene_prediction_view (568654245 rows):
Table
"public.gene_prediction_view"
Column | Type
| Modifiers
----------------------------------+------------------------+-------------------------------------------------------------------
id | integer | not null
default nextval('gene_prediction_view_id_seq'::regclass)
gene_ref | integer | not null
go_term_ref | integer | not null
go_description | character varying(200) | not null
go_category | character varying(50) | not null
function_verified_exactly | boolean | not null
function_verified_with_parent_go | boolean | not null
score | numeric(12,4) | not null
prediction_method_ref | integer |
functional_score_ref | integer |
Indexes:
"gene_prediction_view_pkey" PRIMARY KEY, btree (id)
"gene_prediction_view_functional_score_ref_key" UNIQUE, btree
(functional_score_ref)
"gene_prediction_view_gene_ref" UNIQUE, btree (gene_ref,
go_term_ref, prediction_method_ref)
Foreign-key constraints:
"gene_prediction_view_functional_score_ref_fkey" FOREIGN KEY
(functional_score_ref) REFERENCES functional_scores(id)
"gene_prediction_view_gene_ref_fkey" FOREIGN KEY (gene_ref)
REFERENCES genes(id)
"gene_prediction_view_go_term_ref_fkey" FOREIGN KEY (go_term_ref)
REFERENCES go_terms(term)
...and just in case someone can give advice on more aggressive settings
that might help out the planner for this particular comptuer...
This computer: Mac Pro / 4 gigs ram / software Raid 0 across two hard
drives.
Production computer: Xeon 3ghz / 32 gigs ram / Debian
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
No comments:
Post a Comment