diff --git a/cip/1.accepted/CIP2017-06-18-multiple-graphs.adoc b/cip/1.accepted/CIP2017-06-18-multiple-graphs.adoc index 391ed719b7..59ba81b07d 100644 --- a/cip/1.accepted/CIP2017-06-18-multiple-graphs.adoc +++ b/cip/1.accepted/CIP2017-06-18-multiple-graphs.adoc @@ -720,7 +720,76 @@ This is the final step of the entire data integration pipeline, we return this g image::opencypher-PersonCityCriminalEvents-graph.jpg[Graph,700,550] -// ._The full data integration query pipeline is given by_: + +[[data-aggregation-example]] +=== Using a pipeline to perform aggregations and return tabular data and graphs + +This example shows how to aggregate detailed sales data within a graph -- in effect, performing a 'roll-up' -- in order to obtain a high-level summarized view of the data, as a graph. +The summarized graph may be used to draw further high-level reports, but may also be used to undertake 'drill-down' actions by probing into the graph to extract more detailed information. + +Assume we have the graph *SalesDetail*, representing the sale of products in stores across various regions: + +image::opencypher-SalesDetail-graph.jpg[Graph,800,700] + +This models the following entities: + +* Regions may have many stores. +* Stores: +** A store is identified by a unique `code`. +** A store is contained in exactly one region. +** A store may have multiple orders. +* Products: +** A product is identified by a unique `code`. +** A product has a `RRP` property (Recommended Retail Price). +** A product may appear in one or more orders as a product _item_. +* Sales orders: +** An order is identified by a unique order number, given by `num`. +** The `YYYYMM` property represents the year and month portion of the date of the order. +** An order is associated with exactly one store and contains one or more product items, representing the fact that the product item was sold in the store and is a part of the order. +** The relationship of between an order and a product contains the following properties: +*** `soldPrice`: the price at which the product item was actually sold (usually lower than the product's RRP). +*** `numItemsSold`: the number of the actual product items sold in the order. + +The following pipeline will create a summarized graph view of this data and return it. + +[source, cypher] +---- +[ 0] FROM SalesDetail +[ 1] MATCH (p:Product)-[r:IN]->(:Order)<-[HAS]-(s:Store) +[ 2] WITH p, s, sum(r.soldPrice * r.numItemsSold) AS storeProductTotal +[ 3] CONSTRUCT ON GRAPH CLONE p, s +[ 4] CREATE (p)-[:SUMMARY {totalSales: storeProductTotal}]->(s) + +[ 5] WITH p, sum(storeProductTotal) AS productTotal +[ 6] CONSTRUCT ON GRAPH CLONE p +[ 7] CREATE (p)-[:SUMMARY]->(:SUMMARY {totalSales: productTotal}) + +[ 8] WITH p +[ 9] MATCH (p)-[r:SUMMARY]-(s:Store)-[:IN]-(reg:Region) +[10] WITH s, reg, sum(r.totalSales) AS storeTotal +[11] CONSTRUCT ON GRAPH CLONE s, reg +[12] CREATE (s)-[:SUMMARY]->({totalSales: storeTotal}) +[13] WITH reg, sum(storeTotal) AS regionTotal +[14] CREATE (reg)-[:SUMMARY]->({totalSales: storeTotal}) + +[15] WITH reg +[16] MATCH (reg)<-[:IN]-(:Store)-[summary:SUMMARY]->(p:Product) +[17] WITH r, p, sum(summary.totalSales) as regionProductTotal +[18] CONSTRUCT ON GRAPH CLONE r, p +[19] CREATE (r)-[:SUMMARY {totalSales: regionProductTotal}]->(p) +[20] RETURN GRAPH +---- + + +We start by specifying that we are working on SalesDetails [0], and then find all orders and which store they were created in [1]. +The next step is to sum up all sales grouped by the product and the store [2]. Next, we start building up the summary graph by cloning the detail graph and adding a summary relationship directly between the Product and the Store, not going throught the order node. + +Next up, we aggregate up all sales by product [5], and use this information to construct a graph [6] and add a summary relationship to the product node [7]. + +So far, we have been using the matches from the first MATCH[0], but now it's time to drop the incoming driving table [8] and start matching[9] from scratch again. We are matching for the summary relationships we added in [4] between stores and products, and using this to + +// TODO: Finish explaining this example + // @@ -796,191 +865,6 @@ image::opencypher-PersonCityCriminalEvents-graph.jpg[Graph,700,550] // // -// [[data-aggregation-example]] -// === Using a pipeline to perform aggregations and return tabular data and graphs -// -// This example shows how to aggregate detailed sales data within a graph -- in effect, performing a 'roll-up' -- in order to obtain a high-level summarized view of the data, stored and returned in another graph, as well as returning an even higher-level view as an executive report. -// The summarized graph may be used to draw further high-level reports, but may also be used to undertake 'drill-down' actions by probing into the graph to extract more detailed information. -// -// Assume we have the graph *SalesDetail*, representing the sale of products in stores across various regions: -// -// image::opencypher-SalesDetail-graph.jpg[Graph,800,700] -// -// This models the following entities: -// -// * Regions may have many stores. -// * Stores: -// ** A store is identified by a unique `code`. -// ** A store is contained in exactly one region. -// ** A store may have multiple orders. -// * Products: -// ** A product is identified by a unique `code`. -// ** A product has a `RRP` property (Recommended Retail Price). -// ** A product may appear in one or more orders as a product _item_. -// * Sales orders: -// ** An order is identified by a unique order number, given by `num`. -// ** The `YYYYMM` property represents the year and month portion of the date of the order. -// ** An order is associated with exactly one store and contains one or more product items, representing the fact that the product item was sold in the store and is a part of the order. -// ** The relationship of between an order and a product contains the following properties: -// *** `soldPrice`: the price at which the product item was actually sold (usually lower than the product's RRP). -// *** `numItemsSold`: the number of the actual product items sold in the order. -// -// The following pipeline will create a summarized view of this data, and store it in a new summary graph called *SalesSummary*. -// -// We begin by referencing the *SalesDetail* graph, and matching on all products in all orders for all stores in all regions. -// -// [source, cypher] -// ---- -// FROM GRAPH SalesDetail AT ‘graph://...’ -// MATCH (p:Product)-[r:IN]->(o:Order)<-[HAS]-(s:Store)-[:IN]->(reg:Region) -// ---- -// -// We aggregate the (tabular) data across all orders in order to obtain the total sales amount grouped by the product, store and region, and alias this value as `storeProductTotal`. -// As this tabular data is required to populate the summary graph later on, we pass it further down the pipeline: -// -// [source, cypher] -// ---- -// WITH reg.name AS regionName, -// s.code AS storeCode, -// p.code AS productCode, -// sum(r.soldPrice * r.numItemsSold) AS storeProductTotal -// ---- -// -// The tabular data consists of the following: -// -// [source, cypher] -// ---- -// +------------+-----------+-------------+-------------------+ -// | regionName | storeCode | productCode | storeProductTotal | -// +------------+-----------+-------------+-------------------+ -// | APAC | AC-888 | PEN-1 | 20.00 | -// | APAC | AC-888 | TOY-1 | 45.00 | -// | EMEA | LK-709 | BOOK-2 | 10.00 | -// | EMEA | LK-709 | TOY-1 | 40.00 | -// | EMEA | LK-709 | BOOK-5 | 15.00 | -// | EMEA | WW-531 | BOOK-5 | 18.00 | -// | EMEA | WW-531 | BULB-2 | 190.00 | -// | EMEA | WW-531 | PC-1 | 440.00 | -// +------------+-----------+-------------+-------------------+ -// 8 rows -// ---- -// -// Next, we read from the *SalesDetail* graph to get the store, product and region information: -// -// [source, cypher] -// ---- -// MATCH (p:Product)-[:IN]->(o:Order)<-[:HAS]-(s:Store)-[:IN]->(r:Region) -// ---- -// -// We now create a new graph, *SalesSummary*, containing the summarized view of the sales information across regions, products and stores: -// -// [source, cypher] -// ---- -// INTO NEW GRAPH SalesSummary -// MERGE (s:Store {storeCode: s.code}) -// MERGE (r:Region {name: r.name}) -// MERGE (p:Product {productCode: p.code, RRP: p.RRP}) -// MERGE (s)-[:IN]->(r) -// MERGE (p)-[:SOLD_IN]->(s) -// -// // Get the total amount sold for a store -// WITH storeCode, sum(storeProductTotal) AS totalSales -// // Get the total amount sold for a product -// WITH productCode, sum(storeProductTotal) AS soldTotal -// -// // Update all store nodes with the new totalSales property -// MATCH (s:Store) -// SET s.totalSales = totalSales -// WHERE s.code = storeCode -// -// // Update all product nodes with the new soldTotal property -// MATCH (p:Product) -// SET p.soldTotal = soldTotal -// WHERE p.code = productCode -// -// // Update all (:Product)-[SOLD_IN]->(:Store) relationships with the new sold property -// MATCH (p:Product)-[r:SOLD_IN]->(s:Store) -// SET r.sold = storeProductTotal -// WHERE p.code = productCode -// AND s.code = storeCode -// ---- -// -// As a final step, the *SalesSummary* graph is returned, along with a high-level summarized tabular view of store sales data. -// -// [source, cypher] -// ---- -// RETURN regionName, -// storeCode, -// sum(storeProductTotal) AS totalStoreSales -// GRAPH SalesSummary -// ---- -// -// The *SalesSummary* graph is comprised of the following: -// -// image::opencypher-SalesSummary-graph.jpg[Graph,800,700] -// -// The high-level summarized tabular data consists of the following: -// -// [source, cypher] -// ---- -// +------------+-----------+-----------------+ -// | regionName | storeCode | totalStoreSales | -// +------------+-----------+-----------------+ -// | APAC | AC-888 | 65.00 | -// | EMEA | LK-709 | 65.00 | -// | EMEA | WW-531 | 648.00 | -// +------------+-----------+-----------------+ -// 3 rows -// ---- -// -// We note that the *SalesSummary* graph can be used to generate further high-level sales summaries, such as the total sales of a particular product (shown <>), as well as more detailed views. -// -// ._The full aggregation query pipeline is given by_: -// [source, cypher] -// ---- -// FROM GRAPH SalesDetail AT ‘graph://...’ -// MATCH (p:Product)-[r:IN]->(o:Order)<-[HAS]-(s:Store)-[:IN]->(reg:Region) -// -// WITH reg.name AS regionName, -// s.code AS storeCode, -// p.code AS productCode, -// sum(r.soldPrice * r.numItemsSold) AS storeProductTotal -// -// MATCH (p:Product)-[:IN]->(o:Order)<-[:HAS]-(s:Store)-[:IN]->(r:Region) -// -// INTO NEW GRAPH SalesSummary -// MERGE (s:Store {code: s.code}) -// MERGE (r:Region {name: r.name}) -// MERGE (p:Product {code: p.code, RRP: p.RRP}) -// MERGE (s)-[:IN]->(r) -// MERGE (p)-[:SOLD_IN]->(s) -// -// // Get the total amount sold for a store -// WITH storeCode, sum(storeProductTotal) AS totalSales -// //Get the total amount sold for a product -// WITH productCode, sum(storeProductTotal) AS soldTotal -// -// // Update all store nodes with the new totalSales property -// MATCH (s:Store) -// SET s.totalSales = totalSales -// WHERE s.code = storeCode -// -// // Update all product nodes with the new soldTotal property -// MATCH (p:Product) -// SET p.soldTotal = soldTotal -// WHERE p.code = productCode -// -// // Update all (:Product)-[SOLD_IN]->(:Store) relationships with the new sold property -// MATCH (p:Product)-[r:SOLD_IN]->(s:Store) -// SET r.sold = storeProductTotal -// WHERE p.code = productCode -// AND s.code = storeCode -// -// RETURN regionName, -// storeCode, -// sum(storeProductTotal) AS totalStoreSales -// GRAPH SalesSummary -// ---- // // [[data-aggregation-external-example]] // === Using a pipeline in an external execution context