Friday, August 24, 2007

SQL Question

I am having SQL issues and I haven't really thought about anything complex in SQL for at least a year .... needless to say I am a bit out of mental shape. Please disregard any really stupid mistakes or oversites and realize I had the best of intentions. If anyone has any suggestions or comments, please post a comment.

Problem: I have a rather large and verbose SQL query that pulls data for an HTML report I need to make. The query is joinining several tables together based on various keys, and is selecting a large number of columns from the result set. Your normal stuff. 3 pieces of the resulting information should be unique for each row (grower, retailer, and product). Basically, each row associates these 3 things.

Example main query:

SELECT
SUBMISSION,
GROWER,
RETAILER,
PRODUCT
...
FROM
SUBMISSION_TABLE,
GROWER_TABLE,
RETAILER_TABLE
...
WHERE
[relate each table to the other]
[do some filtering based on parameters of the report]
...
A completely seperate set of tables needs to be joined, filtered and summarized to find the amount of the given product, the retailer, sold the grower. To find this result we do a SUM on sales whith the known grower, product, and retailer.

Example secondary query:
SELECT
SUM(sales)
FROM
SALES_TABLES...
WHERE
sales_tables.grower = XXXX
AND sales_tables.retailer = YYYY
AND sales_tables.product = ZZZZ
Sounds simple enough?

Question: how can I combine these two queries so that my one query can show the summarized value along with the specifics?

My first thought was to do something like this:
SELECT
SUBMISSION,
GROWER,
RETAILER,
PRODUCT
...
FROM
SUBMISSION_TABLE,
GROWER_TABLE,
RETAILER_TABLE
(SELECT
SUM(sales)
FROM
SALES_TABLES...
WHERE
sales_tables.grower = GROWER
AND sales_tables.retailer = RETAILER
AND sales_tables.product = PRODUCT) sales_vt
...
WHERE
[relate each table to the other]
[do some filtering based on parameters of the report]
But that didn't work b/c you can't reference the outside tables from within the "virtual table query".

Then I tried this:
SELECT
SUBMISSION,
GROWER,
RETAILER,
PRODUCT
...
FROM
SUBMISSION_TABLE,
GROWER_TABLE,
RETAILER_TABLE
(SELECT
grower,
retailer,
product,
SUM(sales)
FROM
SALES_TABLES...
GROUP BY grower, retailer, product) sales,
...
WHERE
[relate each table to the other]
[do some filtering based on parameters of the report]
AND sales.grower = GROWER
AND sales.retailer = RETAILER
AND sales.product = PRODUCT
But that won't work b/c the sales tables are huge and the time to execute goes through the roof which is totally not necessary since we already know what grower-retailer-product tuples we need

I thought about moving everything from the virtual table up into the main table, but then I would need to do a GROUP BY all of the other columns and there are a TON more than just these.

Anyone have any suggestions? Did that make any sense?

1 comment:

Alex said...

My first attempt would be to do the group by, a little copy and paste never killed anyone, right?

On the final query that you were having performance issues with, maybe it would help to explicitly specify the joins (hopefully some of the just left joins). If you just do the select from a bunch of tables I think it does a cartesian product on all of the tables together, which seems like it may grow REALLY fast.