I ran the explain analyze.Here is what i got:
"Group (cost=112266.37..112266.40 rows=1 width=56) (actual time=5583.399..5615.476 rows=13373 loops=1)"
" -> Sort (cost=112266.37..112266.38 rows=1 width=56) (actual time=5583.382..5590.890 rows=13373 loops=1)"
" Sort Key: lane_data_07_08.lane_id, lane_data_07_08.measurement_start, lane_data_07_08.measurement_end, lane_data_07_08.speed, lane_data_07_08.volume, lane_data_07_08.occupancy, lane_data_07_08.quality, lane_data_07_08.effective_date"
" -> Nested Loop IN Join (cost=0.00..112266.36 rows=1 width=56) (actual time=1100.307..5547.768 rows=13373 loops=1)"
" -> Seq Scan on lane_data_07_08 (cost=0.00..112241.52 rows=3 width=56) (actual time=1087.666..5341.662 rows=20581 loops=1)"
" Filter: (((volume = 255::double precision) OR (speed = 255::double precision) OR (occupancy = 255::double precision) OR (occupancy >= 100::double precision) OR (volume > 52::double precision) OR (volume < 0::double precision) OR (speed > 120::double precision) OR (speed < 0::double precision)) AND (date_part('hour'::text, measurement_start) >= 5::double precision) AND (date_part('hour'::text, measurement_start) <= 23::double precision) AND (date_part('day'::text, measurement_start) = 1::double precision))"
" -> Index Scan using lane_info_pk on lane_info (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=20581)"
" Index Cond: (lane_data_07_08.lane_id = lane_info.lane_id)"
" Filter: (inactive IS NULL)"
"Total runtime: 5621.409 ms"
Well instaed of creating extra indexes (since they eat up lot of space) i made use of the whole measurement_start field, so thet it uses the index proeprty and makes the search faster.
So i changed the query to include the measuerment start as follows:
SELECT lane_id,measurement_start,
measurement_end,speed,volume,occupancy,quality,effective_date
FROM tss.lane_data_06_08
WHERE lane_id in(select lane_id from lane_info where inactive is null )
AND measurement_start between '2008-06-30 05:00:00-04' AND '2008-06-30 23:00:00-04'GROUP BY lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
ORDER BY lane_id, measurement_start
Samantha
On 7/1/08, Scott Marlowe <
scott.marlowe@gmail.com> wrote:
> On Tue, Jul 1, 2008 at 1:29 PM, samantha mahindrakar
> <
sam.mahindrakar@gmail.com> wrote:
> > Hi
> > I have a select statement that runs on a partition having say couple
> > million rows.
> > The tabel has indexes on two colums. However the query uses the
> > non-indexed colums too in its where clause.
> > For example:
> > SELECT lane_id,measurement_start,
> > measurement_end,speed,volume,occupancy,quality,effective_date
> > FROM tss.lane_data_06_08
> > WHERE lane_id in(select lane_id from lane_info where inactive is null )
> > AND date_part('hour', measurement_start) between 5 and 23
> > AND date_part('day',measurement_start)=30
> > GROUP BY lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
> > ORDER BY lane_id, measurement_start
> >
> > out of this only lane_id and mesaurement_start are indexed. This query
> > will return around 10,000 rows. But it seems to be taking a long time
> > to execute which doesnt make sense for a select statement. It doesnt
> > make any sense to create index for every field we are gonna use in tne
> > where clause.
> > Isnt there any way we can improve the performance?
>
> I'm guessing that adding an index for either
> date_part('hour',measurement_start) or
> date_part('day',measurement_start) or both would help.
>
> What does explain analyze select ... (rest of query here) say?
>