Friday, June 6, 2008

[GENERAL] hopefully a brain teaser, can't quite figure out query

Hi guys, maybe this is just a teaser for me, but any help would be
awesome. My best crack at the solution is not returning yet after a
good ten minutes. I'll post the explain analyze if it ever comes
back. I have no indexing, which is probably embarrassing, I'm just
not quite sure what to index or really how.

So, I've got two tables, which I'm going to try to simplify and still
get across:

The small table is a listing of county fips codes, their name, and the
geometry for the county. Each fips is only listed once. The big table
is multiple emissions for each county, the parameter for the emission,
and the source code for the emission (scc). Each county in big tbale
has many entries, variable number of pollutant types, variable number
of scc's.

small table:
fips (int), name(string), geom(geometry)
123, "some county", "some geometry"
124, "some other county", "some other geometry"
etc.

big table:
fips (int), pollutant(string), value(double), scc(int)
123, "co", 1000, 1000
123, "co", 1500, 1000
123, "co", 500, 1001
123, "co", 550, 1001
123, "co", 1500, 1002
123, "co", 50, 1002

123, "so2", 1100, 1000
123, "so2", 510, 1001
123, "so2", 1510, 1002
etc.

So the user supplies the pollutant, a number of scc's, and a minimum
value.
I need to select all the fips from the small table, as well as the sum
of values in the big table (for that pollutant and scc codes) WHERE
the sum of the values in the big table (for that pollutant and those
scc codes) are larger than the given value.

So, for the above, say the user picked "CO" for pollutant, 1001 and
1002 for SCC.
the rows in big table with 1001 OR 1002 for scc AND "co" for pollutant
are:
123, "co", 500, 1001
123, "co", 550, 1001
123, "co", 1500, 1002
123, "co", 50, 1002

I would return the 123 fips ONLY if the value provided was less than
the sum of the values for all scc's (500+550+1500+50 = 2600), as well
as the sum for those values.

so, return set would be
fips, name, value
123, "some county", 2600

for each fips in small table...

Boy I wonder if that makes sense, maybe some pseudo SQL speak would be
better:

SELECT small.fips, small.name, sum(big.value)
FROM small, big
WHERE
small.fips in (
SELECT fips from big
WHERE ((pollutant='co') AND
(
(scc LIKE '21%') OR (scc LIKE '2301%') OR (scc LIKE '2801000000%')
)
HAVING SUM(value > 2000)
)
GROUP BY small.fips, small.name;

This is the query that isn't returning yet.
If anyone has any questions, comments, or any suggestions at all, I'll
do my best to respond ASAP.
Any help is extremely appreciated!
-Ed

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

No comments: