Skip to content

Workflow: Relational only

mayureshkunjir edited this page Aug 16, 2013 · 3 revisions

Query

For a given set of campaigns, find out total sales of products endorsed during the promotion.

Pictorial workflow

alt-text

Description

  • The query first filters promotion for the specified promotions.
  • It next joins this table with each of the three sales channels viz. store_sales, catalog_sales, and web_sales. This produces all sales records for every required promotion.
  • An union is taken of the three join results and a roll-up produces total sales for each promotion.
  • The last task is to find name of the product by doing a look-up into item table.

SQL variant

WITH grouped AS (
SELECT p_promo_id, p_item_sk, SUM(sales) AS total_sales 
FROM (
SELECT p_promo_id, p_item_sk, (ss_quantity * ss_sales_price) AS sales FROM promotion, store_sales 
WHERE p_promo_sk >= :promo_start
AND p_promo_sk <= :promo_end
AND ss_sold_date_sk >= p_start_date_sk
AND ss_sold_date_sk <= p_end_date_sk

UNION

SELECT p_promo_id, p_item_sk, (cs_quantity * cs_sales_price) AS sales FROM promotion, catalog_sales 
WHERE p_promo_sk >= :promo_start 
AND p_promo_sk <= :promo_end
AND cs_sold_date_sk >= p_start_date_sk
AND cs_sold_date_sk <= p_end_date_sk

UNION

SELECT p_promo_id, p_item_sk, (ws_quantity * ws_sales_price) AS sales FROM promotion, web_sales 
WHERE p_promo_sk >= :promo_start
AND p_promo_sk <= :promo_end
AND ws_sold_date_sk >= p_start_date_sk
AND ws_sold_date_sk <= p_end_date_sk
) AS promo_sales
GROUP BY p_promo_id, p_item_sk
) 
SELECT p_promo_id, i_product_name, total_sales 
FROM item, grouped
WHERE i_item_sk = p_item_sk

Note: ':promo_start' and ':promo_end' are selectivity parameters

HiveQL variant

SELECT p_promo_id, i_product_name, total_sales 
FROM (
SELECT p_promo_id, p_item_sk, SUM(sales) AS total_sales 
FROM (
SELECT p_promo_id, p_item_sk, (ss_quantity * ss_sales_price) AS sales FROM promotion p JOIN store_sales ss 
ON (p.p_item_sk = ss.ss_item_sk)
WHERE p_promo_sk >= :promo_start
AND p_promo_sk <= :promo_end
AND ss_sold_date_sk >= p_start_date_sk
AND ss_sold_date_sk <= p_end_date_sk

UNION ALL

SELECT p_promo_id, p_item_sk, (cs_quantity * cs_sales_price) AS sales FROM promotion p JOIN catalog_sales cs
ON (p.p_item_sk = cs.cs_item_sk)
WHERE p_promo_sk >= :promo_start
AND p_promo_sk <= :promo_end
AND cs_sold_date_sk >= p_start_date_sk
AND cs_sold_date_sk <= p_end_date_sk

UNION ALL

SELECT p_promo_id, p_item_sk, (ws_quantity * ws_sales_price) AS sales 
FROM promotion p JOIN web_sales ws
ON (p.p_item_sk = ws.ws_item_sk) 
WHERE p_promo_sk >= :promo_start
AND p_promo_sk <= :promo_end
AND ws_sold_date_sk >= p_start_date_sk
AND ws_sold_date_sk <= p_end_date_sk
) promo_sales
GROUP BY p_promo_id, p_item_sk 
) g
JOIN item i
ON (g.p_item_sk = i.i_item_sk)

Note: ':promo_start' and ':promo_end' are selectivity parameters