diff --git a/README.md b/README.md index da0bef4..1baefd9 100644 --- a/README.md +++ b/README.md @@ -42,7 +42,6 @@ This dbt package contains macros for SQL functions to run the dbt project on mul - [to_timestamp](#to_timestamp-source) - [to_varchar](#to_varchar-source) - [Generic tests](#Generic-tests) - - [test_edge_count](#test_edge_count-source) - [test_equal_rowcount](#test_equal_rowcount-source) - [test_exists](#test_exists-source) - [test_field_length](#test_field_length-source) @@ -64,9 +63,6 @@ This dbt package contains macros for SQL functions to run the dbt project on mul - [mandatory](#mandatory-source) - [optional](#optional-source) - [optional_table](#optional_table-source) -- [Process mining tables](#Process-mining-tables) - - [generate_edge_table](#generate_edge_table-source) - - [generate_variant](#generate_variant-source) ### Multiple databases @@ -193,19 +189,6 @@ Usage: ### Generic tests -#### test_edge_count ([source](macros/generic_tests/test_edge_count.sql)) -This generic test evaluates whether the number of edges is as expected based on the event log. The expected number of edges is equal to the number of events plus the number of cases, since also edges from the source node and to the sink node are taken into account. - -Usage: -``` -models: - - name: Edge_table_A - tests: - - pm_utils.edge_count: - event_log: 'Event_log_model' - case_ID: 'Case_ID' -``` - #### test_equal_rowcount ([source](macros/generic_tests/test_equal_rowcount.sql)) This generic test evaluates whether two models have the same number of rows. @@ -469,44 +452,3 @@ Usage: `{{ pm_utils.optional_table(source('source_name', 'table_name')) }}` Note: you can only apply the macro for source tables in combination with the `optional()` macro applied to all its fields. - -### Process mining tables - -#### generate_edge_table ([source](macros/process_mining_tables/generate_edge_table.sql)) -The edge table contains all transitions in the process graph. Each transition is indicated by the `From_activity` and the `To_activtiy` and for which case this transition took place. The edge table includes transitions from the source node and to the sink node. - -The required input is an event log model with fields describing the case ID, activity, and event order. With the argument `table_name` you indicate how to name the generated table. The generated table contains at least the following columns: `Edge_ID`, one according to the given case ID, `From_activity` and `To_activity`. It also generates a column `Unique_edge`, which contains the value 1 once per occurrence of an edge per case. - -Optional input is a list of properties. This generates columns like `Unique_edge`, which contains the value 1 once per occurrence of an edge per the given property. The name of this column is `Unique_edge` concatenated with the property. - -Usage: -``` -{{ pm_utils.generate_edge_table( - table_name = 'Edge_table', - event_log_model = 'Event_log', - case_ID = 'Case_ID', - activity = 'Activity', - event_order = 'Event_order', - properties = ['Property1', 'Property2']) -}} -``` - -This generates the table `Edge_table` with columns `Edge_ID`, `Case_ID`, `From_activity`, `To_activity`, `Unique_edge`, `Unique_edge_Property1` and `Unique_edge_Property2`. - -#### generate_variant ([source](macros/process_mining_tables/generate_variant.sql)) -A variant is a particular order of activities that a case executes. The most occurring variant is named "Variant 1", the next most occurring one "Variant 2", etc. This macro generates a cases table with for each case the variant. - -The required input is an event log model with fields describing the case ID, activity, and event order. With the argument `table_name` you indicate how to name the generated table. The generated table contains two columns: one according to the given case ID and `Variant`. - -Usage: -``` -{{ pm_utils.generate_variant( - table_name = 'Cases_table_with_variant', - event_log_model = 'Event_log', - case_ID = 'Case_ID', - activity = 'Activity', - event_order = 'Event_order') -}} -``` - -This generates the table `Cases_table_with_variant` with columns `Case_ID` and `Variant`. diff --git a/macros/generic_tests/test_edge_count.sql b/macros/generic_tests/test_edge_count.sql deleted file mode 100644 index 912ad55..0000000 --- a/macros/generic_tests/test_edge_count.sql +++ /dev/null @@ -1,13 +0,0 @@ -{% macro test_edge_count(model, event_log, case_ID) %} - -{{ config(fail_calc = 'coalesce(diff_count, 0)') }} - -{# Number of edges is equal to the number of events plus number of cases, since source and sink edges are included. #} -select - abs(count_edges - count_events - count_cases) as diff_count -from (select count(*) as count_edges from {{ model }}) as model_edges -cross join (select count(*) as count_events from "{{ model.schema }}"."{{ event_log }}") as model_events -{# Compute number of cases by grouping the event log on case ID. #} -cross join (select count(*) as count_cases from (select "{{ case_ID }}" from "{{ model.schema }}"."{{ event_log }}" group by "{{ case_ID }}") as grouped_event_log) as model_cases - -{% endmacro %} diff --git a/macros/process_mining_tables/generate_edge_table.sql b/macros/process_mining_tables/generate_edge_table.sql deleted file mode 100644 index e0368d5..0000000 --- a/macros/process_mining_tables/generate_edge_table.sql +++ /dev/null @@ -1,107 +0,0 @@ -{%- macro generate_edge_table(table_name, event_log_model, case_ID, activity, event_order, properties) -%} - -Event_log as ( - select * from {{ event_log_model }} -), - -Last_events_of_cases as ( - select - max(Event_log."{{ event_order }}") as "Event_order_last_event" - from Event_log - group by Event_log."{{ case_ID }}" -), - -/* The Edges table contains all edges indicated by the Case ID, the From activity and To activity. -The edge table also includes edges from the source node and to the sink node. The source and sink nodes are indicated by Activity = null. -Optionally, additional properties of the event log can be added to the edge table to optimize metrics. */ -Edges_preprocessing as ( - select - Event_log."{{ case_ID }}", - Event_log."{{ event_order }}", - lag(Event_log."{{ activity }}") over ( - partition by Event_log."{{ case_ID }}" - order by Event_log."{{ event_order }}") as "From_activity", - Event_log."{{ activity }}" as "To_activity" - {% for property in properties -%} - , Event_log."{{ property }}" - {% endfor -%} - from Event_log - union all - -- To generate the edges to the sink node, records are appended with the activtiy of the last event as From activity and null as To activity. - select - Event_log."{{ case_ID }}", - null as "{{ event_order }}", - Event_log."{{ activity }}" as "From_activity", - null as "To_activity" - {% for property in properties -%} - , Event_log."{{ property }}" - {% endfor -%} - from Event_log - inner join Last_events_of_cases - on Event_log."{{ event_order }}" = Last_events_of_cases."Event_order_last_event" -), - --- Order by condition includes Case ID and Event order to sort on unique values. In this way, row_number() returns always the same values. -Edges_with_edge_ID as ( - select - row_number() over (order by Edges_preprocessing."{{ case_ID }}", Edges_preprocessing."{{ event_order }}") as "Edge_ID", - Edges_preprocessing."{{ case_ID }}", - Edges_preprocessing."From_activity", - Edges_preprocessing."To_activity" - {% for property in properties -%} - , Edges_preprocessing."{{ property }}" - {% endfor -%} - from Edges_preprocessing -), - --- For every edge per case the first occurence is marked to optimize metrics. -Edge_first_occurence as ( - select - min(Edges_with_edge_ID."Edge_ID") as "Edge_first_occurence" - from Edges_with_edge_ID - group by Edges_with_edge_ID."{{ case_ID }}", - Edges_with_edge_ID."From_activity", - Edges_with_edge_ID."To_activity" -), - --- Optionally, for other properties besides the Case ID the first occurence of an edge can be marked to optimize metrics. -{% for property in properties -%} - {{'Edge_first_occurence' ~ '_' ~ property }} as ( - select - min(Edges_with_edge_ID."Edge_ID") as "Edge_first_occurence" - from Edges_with_edge_ID - group by Edges_with_edge_ID."{{ property }}", - Edges_with_edge_ID."From_activity", - Edges_with_edge_ID."To_activity" - ), -{% endfor -%} - -{{ table_name }} as ( - select - Edges_with_edge_ID."Edge_ID", - Edges_with_edge_ID."{{ case_ID }}", - Edges_with_edge_ID."From_activity", - Edges_with_edge_ID."To_activity", - -- Every first occurence of an edge is marked as unique, therefore given the value 1. - case - when Edge_first_occurence."Edge_first_occurence" is not null - then 1 - else null - end as "Unique_edge" - {% for property in properties -%} - , case - when {{'Edge_first_occurence' ~ '_' ~ property }}."Edge_first_occurence" is not null - then 1 - else null - end as "{{ 'Unique_edge' ~ '_' ~ property }}" - {% endfor -%} - from Edges_with_edge_ID - left join Edge_first_occurence - on Edges_with_edge_ID."Edge_ID" = Edge_first_occurence."Edge_first_occurence" - {% for property in properties -%} - left join {{'Edge_first_occurence' ~ '_' ~ property }} - on Edges_with_edge_ID."Edge_ID" = {{'Edge_first_occurence' ~ '_' ~ property }}."Edge_first_occurence" - {% endfor -%} -) - -{%- endmacro -%} diff --git a/macros/process_mining_tables/generate_variant.sql b/macros/process_mining_tables/generate_variant.sql deleted file mode 100644 index 27cd508..0000000 --- a/macros/process_mining_tables/generate_variant.sql +++ /dev/null @@ -1,40 +0,0 @@ -{%- macro generate_variant(table_name, event_log_model, case_ID, activity, event_order) -%} - -Event_log as ( - select * from {{ ref(event_log_model) }} -), - --- The activities per Case ID are concatenated, ordered by Event order, such that every particular order of executed activities is a separate variant. -Cases_with_variant_ID as ( - select - Event_log."{{ case_ID }}", - {% if target.type == 'snowflake' -%} - listagg(Event_log."{{ activity }}", '->') - {% elif target.type == 'sqlserver' -%} - string_agg(convert(nvarchar(max), Event_log."{{ activity }}"), '->') - {% endif -%} - within group (order by Event_log."{{ event_order }}") as "Variant_ID" - from Event_log - group by Event_log."{{ case_ID }}" -), - --- A variant number is decided by counting the amount of occurrences of the variant. -Variant as ( - select - Cases_with_variant_ID."Variant_ID", - concat({{ pm_utils.as_varchar('Variant ') }}, row_number() over (order by count(Cases_with_variant_ID."Variant_ID") desc)) as "Variant" - from Cases_with_variant_ID - group by Cases_with_variant_ID."Variant_ID" -), - --- The variants are joined to the cases on the Variant ID to create a table with the Case ID and Variant field. -{{ table_name }} as ( - select - Cases_with_variant_ID."{{ case_ID }}", - Variant."Variant" - from Cases_with_variant_ID - inner join Variant - on Cases_with_variant_ID."Variant_ID" = Variant."Variant_ID" -) - -{%- endmacro -%} diff --git a/third-party_licenses.txt b/third-party_licenses.txt index 8d69f14..15aaaeb 100644 --- a/third-party_licenses.txt +++ b/third-party_licenses.txt @@ -1,5 +1,5 @@ ---------------------------------------------------------------------------------------------- -dbt 1.1.2 +dbt 1.4.6 Apache License Version 2.0, January 2004