<sam.mahindrakar@gmail.com> wrote:
> I ran the explain analyze.Here is what i got:
>
>
> "Group (cost=112266.37..112266.40 rows=1 width=56) (actual
> time=5583.399..5615.476 rows=13373 loops=1)"
> " -> Sort (cost=112266.37..112266.38 rows=1 width=56) (actual
> time=5583.382..5590.890 rows=13373 loops=1)"
> " Sort Key: lane_data_07_08.lane_id,
> lane_data_07_08.measurement_start, lane_data_07_08.measurement_end,
> lane_data_07_08.speed, lane_data_07_08.volume, lane_data_07_08.occupancy,
> lane_data_07_08.quality, lane_data_07_08.effective_date"
> " -> Nested Loop IN Join (cost=0.00..112266.36 rows=1 width=56)
> (actual time=1100.307..5547.768 rows=13373 loops=1)"
> " -> Seq Scan on lane_data_07_08 (cost=0.00..112241.52 rows=3
> width=56) (actual time=1087.666..5341.662 rows=20581 loops=1)"
You can see here that the seq scan on lane_data is what's eating up
all your time. Also, since the row estimate is WAY off, it then chose
a nested loop thinking it would be joining up only 1 row and actually
running across 20k rows.
> " Filter: (((volume = 255::double precision) OR (speed =
> 255::double precision) OR (occupancy = 255::double precision) OR (occupancy
>>= 100::double precision) OR (volume > 52::double precision) OR (volume <
> 0::double precision) OR (speed > 120::double precision) OR (speed <
> 0::double precision)) AND (date_part('hour'::text, measurement_start) >=
> 5::double precision) AND (date_part('hour'::text, measurement_start) <=
> 23::double precision) AND (date_part('day'::text, measurement_start) =
> 1::double precision))"
> " -> Index Scan using lane_info_pk on
> lane_info (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.007 rows=1
> loops=20581)"
> " Index Cond: (lane_data_07_08.lane_id =
> lane_info.lane_id)"
> " Filter: (inactive IS NULL)"
> "Total runtime: 5621.409 ms"
>
>
> Well instaed of creating extra indexes (since they eat up lot of space) i
> made use of the whole measurement_start field, so thet it uses the index
> proeprty and makes the search faster.
> So i changed the query to include the measuerment start as follows:
>
> SELECT lane_id,measurement_start,
> measurement_end,speed,volume,occupancy,quality,effective_date
> FROM tss.lane_data_06_08
> WHERE lane_id in(select lane_id from lane_info where inactive is null )
> AND measurement_start between '2008-06-30 05:00:00-04' AND '2008-06-30
> 23:00:00-04'
> GROUP BY
> lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
> ORDER BY lane_id, measurement_start
Yeah, anytime you can just compare date / timestamp on an indexed
field you'll do better. If you find yourself needing to use the other
syntax, so you can, for instance, grab the data for 5 days in a row
from 5am to 11am or something, then the method I mentioned of making
indexes on date_part are a good choice. Note that you need regular
timestamp, not timstamptz to create indexes.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
No comments:
Post a Comment