Skip to content

Commit

Permalink
[issue 465]Create a documentation section to use Grafana DataSource w…
Browse files Browse the repository at this point in the history
…ith SonataFlow Prometheus metrics: address review comments
  • Loading branch information
jianrongzhang89 committed Dec 12, 2024
1 parent 79a31d8 commit c406023
Show file tree
Hide file tree
Showing 7 changed files with 172 additions and 166 deletions.
4 changes: 3 additions & 1 deletion serverlessworkflow/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@
*** xref:cloud/operator/service-discovery.adoc[Service Discovery]
*** xref:cloud/operator/using-persistence.adoc[Workflow Persistence]
*** xref:cloud/operator/configuring-workflow-eventing-system.adoc[Workflow Eventing System]
*** xref:cloud/operator/monitoring-workflows.adoc[Workflow Monitoring]
*** Monitoring
**** xref:cloud/operator/monitoring-workflows.adoc[Workflow Monitoring]
**** xref:cloud/operator/sonataflow-metrics.adoc[Prometheus Metrics for Workflows]
// *** xref:cloud/operator/configuring-knative-eventing-resources.adoc[Knative Eventing]
*** xref:cloud/operator/known-issues.adoc[Roadmap and Known Issues]
*** xref:cloud/operator/add-custom-ca-to-a-workflow-pod.adoc[Add Custom CA to Workflow Pod]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
== Overview

In {product_name}, you can check the following metrics:

* `kogito_process_instance_started_total`: Number of started workflows (a workflow that has started might be running or completed)
* `kogito_process_instance_running_total`: Number of running workflows
* `kogito_process_instance_completed_total`: Number of completed workflows
* `kogito_process_instance_error`: Number of workflows that report an error ( a workflow with an error might be still running or have been completed)
* `kogito_process_instance_duration_seconds`: Duration of a process instance in seconds
* `kogito_node_instance_duration_milliseconds`: Duration of relevant nodes in milliseconds (a workflow is composed by nodes, user might be interested on the time consumed by an specific node type)
* `sonataflow_input_parameters_counter`: Records input parameters, the occurrences of <"param_name","param_value"> per `processId`.
[NOTE]
====
Internally, workflows are referred as processes. Therefore, the `processId` and `processName` is workflow ID and name respectively.
====

Each of the metrics mentioned previously contains a label for a specific workflow ID. For example, the `kogito_process_instance_completed_total` metric below contains the labels for `callbackstatetimeouts` workflow:

.Example `kogito_process_instance_completed_total` metric
[source,yaml]
----
# HELP kogito_process_instance_completed_total Completed Process Instances
# TYPE kogito_process_instance_completed_total counter
kogito_process_instance_completed_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",process_state="Completed",version="1.0.0-SNAPSHOT",} 3.0
----

[NOTE]
====
Internally, {product_name} uses Quarkus Micrometer extension, which also exposes built-in metrics. You can disable the Micrometer metrics in {product_name}. For more information, see link:https://quarkus.io/guides/micrometer[Quarkus - Micrometer Metrics].
====

== Metrics Description

=== kogito_process_instance_started_total
Count the number of started workflow instances.

[source, yaml]
----
# HELP kogito_process_instance_started_total Started Process Instances
# TYPE kogito_process_instance_started_total counter
kogito_process_instance_started_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 7.0
----

=== kogito_process_instance_running_total
Records the number of running workflow instances.

[NOTE]
====
This includes workflow instances that are in the `Error` state, since the error state is not a terminal state.
Process instances that have reached a terminal status, i.e. `Completed` or `Aborted`, are not present in this metric.
====

[source, yaml]
----
# HELP kogito_process_instance_running_total Running Process Instances
# TYPE kogito_process_instance_running_total gauge
kogito_process_instance_running_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 4.0
----

=== kogito_process_instance_completed_total
Workflow instances that have reached a terminal status, `Aborted` or `Completed`, and thus are considered as completed.

[NOTE]
====
These are the only two terminal status. The `Error` state is not terminal.
Additionally, the metric has the process_state=`Completed`, or could be `Aborted`, to register exactly which of the two terminal status were reached.
====

[source, yaml]
----
# HELP kogito_process_instance_completed_total Completed Process Instances
# TYPE kogito_process_instance_completed_total counter
kogito_process_instance_completed_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",process_state="Completed",version="1.0.0-SNAPSHOT",} 3.0
----

=== kogito_process_instance_error
Records the number of errors that have occurred per processId and error, including the error message.

[source, yaml]
----
# HELP kogito_process_instance_error Number of errors that has occurred
# TYPE kogito_process_instance_error counter
----

=== kogito_process_instance_duration_seconds
Calculates duration of a workflow instance that has reached a terminal state,, i.e. `Aborted` or `Completed`. This metric is registered when the process reaches the terminal state.

[source, yaml]
----
# HELP kogito_process_instance_duration_seconds_max Process Instances Duration
# TYPE kogito_process_instance_duration_seconds_max gauge
kogito_process_instance_duration_seconds_max{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 30.0
# HELP kogito_process_instance_duration_seconds Process Instances Duration
# TYPE kogito_process_instance_duration_seconds summary
kogito_process_instance_duration_seconds_count{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 3.0
kogito_process_instance_duration_seconds_sum{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 90.0
----

=== kogito_node_instance_duration_milliseconds
Records the duration of the execution for nodes “relevant” to the workflows. The metric is calculated when a given node has finished executing.

[source, yaml]
----
# HELP kogito_node_instance_duration_milliseconds_max Relevant nodes duration in milliseconds
# TYPE kogito_node_instance_duration_milliseconds_max gauge
kogito_node_instance_duration_milliseconds_max{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 30014.0
# HELP kogito_node_instance_duration_milliseconds Relevant nodes duration in milliseconds
# TYPE kogito_node_instance_duration_milliseconds summary
kogito_node_instance_duration_milliseconds_count{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 3.0
kogito_node_instance_duration_milliseconds_sum{artifactId="serverless-workflow-project",node_name="CallbackState",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 90128.0
----

=== sonataflow_input_parameters_counter_total

Records the occurrences of <"param_name", "param_value"> per processId.

[NOTE]
====
Parameters that are json values, or arrays are flattened.
====

[source, yaml]
----
# HELP sonataflow_input_parameters_counter_total Input parameters
# TYPE sonataflow_input_parameters_counter_total counter
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="name",param_value="walter",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 1.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="surname.sur1",param_value="Medvedeo",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 1.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="name",param_value="bob",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 5.0
sonataflow_input_parameters_counter_total{app_id="sonataflow-process-monitoring-listener",artifactId="serverless-workflow-project",param_name="surname",param_value="esponja",process_id="callbackstatetimeouts",version="1.0.0-SNAPSHOT",} 5.0
----
8 changes: 8 additions & 0 deletions serverlessworkflow/modules/ROOT/pages/cloud/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,14 @@ xref:cloud/operator/monitoring-workflows.adoc[]
Learn how to configure Prometheus, Grafana and Grafana Dashboard for monitoring of workflow instances
--

[.card]
--
[.card-title]
xref:cloud/operator/monitoring-workflows.adoc[]
[.card-description]
Learn Prometheus metrics for workflow monitoring
--

[.card]
--
[.card-title]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_started_total{service=~\"$workflow\"})",
"expr": "sum (kogito_process_instance_started_total{process_id=~\"$workflow\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -216,7 +216,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_completed_total{service=~\"$workflow\",process_state=\"Completed\"})",
"expr": "sum (kogito_process_instance_completed_total{process_id=~\"$workflow\",process_state=\"Completed\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -291,7 +291,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_running_total{service=~\"$workflow\"})",
"expr": "sum (kogito_process_instance_running_total{process_id=~\"$workflow\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -366,7 +366,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_completed_total{service=~\"$workflow\",process_state=\"Aborted\"})",
"expr": "sum (kogito_process_instance_completed_total{process_id=~\"$workflow\",process_state=\"Aborted\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -441,7 +441,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_error{service=~\"$workflow\"})",
"expr": "sum (kogito_process_instance_error{process_id=~\"$workflow\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -634,7 +634,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum by(service) (kogito_process_instance_started_total{service=~\"$workflow\"})",
"expr": "sum by(process_id) (kogito_process_instance_started_total{process_id=~\"$workflow\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -709,7 +709,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_completed_total{service=~\"$workflow\",process_state=\"Completed\"})",
"expr": "sum (kogito_process_instance_completed_total{process_id=~\"$workflow\",process_state=\"Completed\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -784,7 +784,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_running_total{service=~\"$workflow\"})",
"expr": "sum (kogito_process_instance_running_total{process_id=~\"$workflow\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -859,7 +859,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_completed_total{service=~\"$workflow\",process_state=\"Aborted\"})",
"expr": "sum (kogito_process_instance_completed_total{process_id=~\"$workflow\",process_state=\"Aborted\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -934,7 +934,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum (kogito_process_instance_error{service=~\"$workflow\"})",
"expr": "sum (kogito_process_instance_error{process_id=~\"$workflow\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -1050,7 +1050,7 @@
},
"disableTextWrap": false,
"editorMode": "code",
"expr": "sum(kogito_process_instance_duration_seconds_sum{service=~\"$workflow\"})/sum(kogito_process_instance_duration_seconds_count{service=~\"$workflow\"})",
"expr": "sum(kogito_process_instance_duration_seconds_sum{process_id=~\"$workflow\"})/sum(kogito_process_instance_duration_seconds_count{process_id=~\"$workflow\"})",
"fullMetaSearch": false,
"includeNullMetadata": true,
"instant": false,
Expand Down Expand Up @@ -1141,7 +1141,7 @@
"disableTextWrap": false,
"editorMode": "code",
"exemplar": false,
"expr": "sum by (node_name) (kogito_node_instance_duration_milliseconds_sum{service=~\"$workflow\"})/sum by (node_name) (kogito_node_instance_duration_milliseconds_count{service=~\"$workflow\"})",
"expr": "sum by (node_name) (kogito_node_instance_duration_milliseconds_sum{process_id=~\"$workflow\"})/sum by (node_name) (kogito_node_instance_duration_milliseconds_count{process_id=~\"$workflow\"})",
"format": "heatmap",
"fullMetaSearch": false,
"includeNullMetadata": true,
Expand Down Expand Up @@ -1342,7 +1342,7 @@
{
"matcher": {
"id": "byName",
"options": "process_id"
"options": "service"
},
"properties": [
{
Expand All @@ -1366,7 +1366,7 @@
{
"matcher": {
"id": "byName",
"options": "service"
"options": "process_id"
},
"properties": [
{
Expand Down Expand Up @@ -1425,7 +1425,7 @@
"disableTextWrap": false,
"editorMode": "code",
"exemplar": false,
"expr": "sonataflow_input_parameters_counter_total{service=~\"$workflow\"}",
"expr": "sonataflow_input_parameters_counter_total{process_id=~\"$workflow\"}",
"format": "table",
"fullMetaSearch": false,
"includeNullMetadata": true,
Expand All @@ -1450,7 +1450,7 @@
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"definition": "label_values(kogito_process_instance_started_total,service)",
"definition": "label_values(kogito_process_instance_started_total,process_id)",
"description": "workflow",
"hide": 0,
"includeAll": true,
Expand All @@ -1460,7 +1460,7 @@
"options": [],
"query": {
"qryType": 1,
"query": "label_values(kogito_process_instance_started_total,service)",
"query": "label_values(kogito_process_instance_started_total,process_id)",
"refId": "PrometheusVariableQueryEditor-VariableQuery"
},
"refresh": 2,
Expand All @@ -1476,7 +1476,7 @@
},
"filters": [
{
"key": "service",
"key": "process_id",
"operator": "=",
"value": "greeting"
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
= Monitoring Workflows
:compat-mode!:
// Metadata:
:description: Workflows monitoring configuration configuration
:description: Workflows monitoring configuration
:keywords: kogito, sonataflow, workflow, operator, kubernetes, prometheus, grafana

// External pages
Expand Down Expand Up @@ -322,11 +322,11 @@ Click `+` -> `Import dashboard`, copy the json model data for xref::cloud/operat
image::cloud/operator/monitoring/grafana-dashboard-example.png[]

=== Customize or build your own dashboard
You can customize or build your own dashboard. For more information, see xref:https://grafana.com/docs/grafana/latest/dashboards[Grafana Dashboards] and xref:cloud/operator/sonataflow-metrics.adoc[SonataFlow Metrics].
You can customize or build your own dashboard. For more information, see link:https://grafana.com/docs/grafana/latest/dashboards[Grafana Dashboards] and xref:cloud/operator/sonataflow-metrics.adoc[Prometheus Metrics for Workflows].

== Additional resources

* xref:cloud/operator/sonataflow-metrics.adoc[SonataFlow Metrics]
* xref:https://grafana.com/docs/grafana/latest/dashboards[Grafana Dashboards]
* xref:cloud/operator/sonataflow-metrics.adoc[Prometheus Metrics for Workflows]
* link:https://grafana.com/docs/grafana/latest/dashboards[Grafana Dashboards]

include::../../../pages/_common-content/report-issue.adoc[]
Loading

0 comments on commit c406023

Please sign in to comment.