A collection of dashboards and their component parts for cloud service provider SaaS services.
-
Create a new
.libsonnet
file on thecsp-mixin/signals
folder that will contain the general settings of your dashboard and its panels like title, description, panel queries, legend template, discovery metric, variable definitions, aggregation level... -
Define variables definition by setting
groupLabels
andinstanceLabels
in your signal. You can use the global values defined onazureconfig.libsonnet
orgcpconfig.libsonnet
or override them as needed. Ex. You can initialize groupLabels like:groupLabels: this.groupLabels
or override it like:groupLabels: ['job', 'resourceName']
. -
Add the new signal definition to the
config.libsonnet
file. Ex.azurevm: (import './signals/azurevm.libsonnet')(this),
. -
Create a new
.libsonnet
file on thecsp-mixin/panels
folder containing the configuration for each panel. Example:[panel_name]: this.signals.[panel_id].[signal_id].as[visualization_type]() + commonlib.panels.generic.[visualization_type].base.stylize(), # Example: avm_instance_count: this.signals.azurevmOverview.instanceCount.asStat() + commonlib.panels.generic.stat.base.stylize(), avm_cpu_utilization: this.signals.azurevm.cpuUtilization.asTimeSeries() + commonlib.panels.generic.timeSeries.base.stylize(),
Use asPanelMixin when you want to show multiple queries on the same panel. Example:
this.signals.azurevmOverview.diskReadOperations.asTimeSeries() + commonlib.panels.generic.timeSeries.base.stylize() + this.signals.azurevmOverview.diskWriteOperations.asPanelMixin()
Note: You can prefix the definition with the first letter of the provider and the first letter of the dashboard name you want to build. Ex.
avm_
for Azure Virtual Machine. -
Add all your row definitions into the
rows.libsonnet
file. You need to define:- The title of the row if you have one.
- The panel(s) that you want to show.
- The width and height of each panel
See example below:
avm_overview: [ # the row definition is optional. You can show all panels together without to group them by row. g.panel.row.new('Overview'), this.grafana.panels.azurevm.avm_instance_count + g.panel.timeSeries.gridPos.withW(12) + g.panel.timeSeries.gridPos.withH(5), this.grafana.panels.azurevm.avm_availability + g.panel.timeSeries.gridPos.withW(12) + g.panel.timeSeries.gridPos.withH(5), ... ]
-
Add your dashboard definition to the
dashboards.libsonnet
file. You can define a unique signal for gcp and azure dashboard or specify a dashboard just for one of the providers (azure or gcp). See example below:+ if csplib.config.uid == 'azure' then { [csplib.config.uid + '-virtualmachines.json']: local variables = csplib.signals.azurevm.getVariablesMultiChoice(); g.dashboard.new(csplib.config.dashboardNamePrefix + 'Virtual Machines') + g.dashboard.withUid(csplib.config.uid + '-virtualmachines') + g.dashboard.withTags(csplib.config.dashboardTags) + g.dashboard.withTimezone(csplib.config.dashboardTimezone) + g.dashboard.withRefresh(csplib.config.dashboardRefresh) + g.dashboard.timepicker.withTimeOptions(csplib.config.dashboardPeriod) + g.dashboard.withVariables(variables) + g.dashboard.withPanels( g.util.grid.wrapPanels( csplib.grafana.rows.avm_overview ) ), } else {},
-
Run the following command from the
csp-mixin
folder to generate all the dashboards injson
format and then import the one you are building on any instance. Check the foldercsp-mixin/dashout
that will contain all the generated dashboards:mixtool generate dashboards -J vendor -d dashout mixin.libsonnet
-
Lint and fix the files modified executing the following command from the root folder:
make fmt
- GCP - https://cloud.google.com/monitoring/api/metrics_gcp#gcp-storage
- The
quota
metrics are alpha, and don't seem to be getting fetched by Grafana Alloy, even when enabled as a metrics prefix. Perhaps this needs to be enabled for a project? replication
metrics are beta. The only metric which is being retrieved by alloy isreplication/meeting_rpo
which is consistently 1 for all buckets. It may not make sense to graph these metrics, but perhaps it's useful to have an alert?storage
metrics (object_count, total_bytes), have a "v2" which is beta. As such, this lib is using the (implied) v1 metrics which are GA.- There are no latency metrics available
- The
- Azure - https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-storage-storageaccounts-blobservices-metrics
Availability
is an available metric. It may not make sense to graph this, but perhaps it is useful to have an alert?- There are latency metrics. In our test environment there is very little (no?) traffic. What I have observed is that E2E latency, and server latency is the same value in our limited dataset. Perhaps this should only show a delta, I.E. if E2E is greater than server.
- Network throughput (ingress/egress) metrics for azure are gauges, not counters. Right now, the promql uses rate, which produces "odd" results. The other option, using
deriv
produces negative values with the available data, which is also suboptimal. We could just put the raw gauge value on the timeseries, and call it a day. 🤔
- AWS - TODO