Wednesday, July 2, 2008

Re: [PERFORM] Select running slow on Postgres

I ran the explain analyze.Here is what i got:


"Group  (cost=112266.37..112266.40 rows=1 width=56) (actual time=5583.399..5615.476 rows=13373 loops=1)"
"  ->  Sort  (cost=112266.37..112266.38 rows=1 width=56) (actual time=5583.382..5590.890 rows=13373 loops=1)"
"        Sort Key: lane_data_07_08.lane_id, lane_data_07_08.measurement_start, lane_data_07_08.measurement_end, lane_data_07_08.speed, lane_data_07_08.volume, lane_data_07_08.occupancy, lane_data_07_08.quality, lane_data_07_08.effective_date"
"        ->  Nested Loop IN Join  (cost=0.00..112266.36 rows=1 width=56) (actual time=1100.307..5547.768 rows=13373 loops=1)"
"              ->  Seq Scan on lane_data_07_08  (cost=0.00..112241.52 rows=3 width=56) (actual time=1087.666..5341.662 rows=20581 loops=1)"
"                    Filter: (((volume = 255::double precision) OR (speed = 255::double precision) OR (occupancy = 255::double precision) OR (occupancy >= 100::double precision) OR (volume > 52::double precision) OR (volume < 0::double precision) OR (speed > 120::double precision) OR (speed < 0::double precision)) AND (date_part('hour'::text, measurement_start) >= 5::double precision) AND (date_part('hour'::text, measurement_start) <= 23::double precision) AND (date_part('day'::text, measurement_start) = 1::double precision))"
"              ->  Index Scan using lane_info_pk on lane_info  (cost=0.00..8.27 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=20581)"
"                    Index Cond: (lane_data_07_08.lane_id = lane_info.lane_id)"
"                    Filter: (inactive IS NULL)"
"Total runtime: 5621.409 ms"


Well instaed of creating extra indexes (since they eat up lot of space) i made use of the whole measurement_start field, so thet it uses the index proeprty and makes the search faster.
So i changed the query to include the measuerment start as follows:

SELECT lane_id,measurement_start,
measurement_end,speed,volume,occupancy,quality,effective_date
FROM tss.lane_data_06_08
WHERE lane_id in(select lane_id from lane_info where inactive is  null )
AND measurement_start between '2008-06-30 05:00:00-04' AND  '2008-06-30 23:00:00-04'
GROUP BY lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
ORDER BY lane_id, measurement_start


Samantha

On 7/1/08, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Tue, Jul 1, 2008 at 1:29 PM, samantha mahindrakar
> <sam.mahindrakar@gmail.com> wrote:
> > Hi
> > I have a select statement that runs on a partition having say couple
> > million rows.
> > The tabel has indexes on two colums. However the query uses the
> > non-indexed colums too in its where clause.
> > For example:
> > SELECT lane_id,measurement_start,
> > measurement_end,speed,volume,occupancy,quality,effective_date
> >  FROM tss.lane_data_06_08
> >  WHERE lane_id in(select lane_id from lane_info where inactive is  null )
> >  AND date_part('hour', measurement_start) between 5 and 23
> >  AND date_part('day',measurement_start)=30
> > GROUP BY lane_id,measurement_start,measurement_end,speed,volume,occupancy,quality,effective_date
> > ORDER BY lane_id, measurement_start
> >
> > out of this only lane_id and mesaurement_start are indexed. This query
> > will return around 10,000 rows. But it seems to be taking a long time
> > to execute which doesnt make sense for a select statement. It doesnt
> > make any sense to create index for every field we are gonna use in tne
> > where clause.
> > Isnt there any way we can improve the performance?
>
> I'm guessing that adding an index for either
> date_part('hour',measurement_start) or
> date_part('day',measurement_start) or both would help.
>
> What does explain analyze select ... (rest of query here) say?
>

No comments: