-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(plugins): additional helm controller metrics #770
base: main
Are you sure you want to change the base?
Conversation
49dcb61
to
7d4c08f
Compare
7e1bf73
to
a6c4de4
Compare
- alert: GreenhousePluginHasErrors | ||
annotations: | ||
summary: "Plugin has errors" | ||
description: "Plugin {{ $labels.plugin }} in organization {{ $labels.organization }} has more then 10 errors" | ||
expr: increase(greenhouse_plugin_reconcile_total{result="error"}[15m] > 0) by (plugin, organization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- alert: GreenhousePluginHasErrors | |
annotations: | |
summary: "Plugin has errors" | |
description: "Plugin {{ $labels.plugin }} in organization {{ $labels.organization }} has more then 10 errors" | |
expr: increase(greenhouse_plugin_reconcile_total{result="error"}[15m] > 0) by (plugin, organization) | |
- alert: GreenhousePluginConstantlyFailing | |
annotations: | |
summary: "Plugin reconciliation is constantly failing" | |
description: "Plugin {{ $labels.plugin }} in organization {{ $labels.organization }} keeps failing with {{ $labels.reason}}" | |
expr: sum by (organization, plugin) (rate(greenhouse_plugin_reconcile_total{result="error"}[5m])) > 0 | |
for: 15m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/controllers/plugin/metrics.go
Outdated
case greenhousev1alpha1.HelmDriftDetectedCondition: | ||
if condition.IsTrue() { | ||
result = metricResultError | ||
reason = metricReasonDiffFailed | ||
} | ||
case greenhousev1alpha1.StatusUpToDateCondition: | ||
if condition.IsFalse() { | ||
result = metricResultError | ||
reason = metricReasonUpgradeFailed | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not reflecting errors in reconciling the helm release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/controllers/plugin/metrics.go
Outdated
|
||
pluginReconcileTotalLabels := prometheus.Labels{ | ||
"pluginDefinition": plugin.Spec.PluginDefinition, | ||
"clusterName": plugin.Spec.ClusterName, | ||
"plugin": plugin.Name, | ||
"organization": plugin.Namespace, | ||
"result": result, | ||
"reason": reason, | ||
} | ||
pluginReconcileTotal.With(pluginReconcileTotalLabels).Inc() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it is not possible to know why the HelmReconcileFailedCondition
is set to true.
Could you instead increase the metrics whenever the condition is set to True, False in the code.
pluginReconcileTotalLabels := prometheus.Labels{ | |
"pluginDefinition": plugin.Spec.PluginDefinition, | |
"clusterName": plugin.Spec.ClusterName, | |
"plugin": plugin.Name, | |
"organization": plugin.Namespace, | |
"result": result, | |
"reason": reason, | |
} | |
pluginReconcileTotal.With(pluginReconcileTotalLabels).Inc() | |
} | |
type( | |
metricResult string | |
metricReason string | |
) | |
const ( | |
success metricResult = "success" | |
error metricResult = "error" | |
templateFailed metricReason = "template_failed" | |
diffFailed metricReason = "diff_failed" | |
upgradeFailed metricReason = "upgrade_failed" | |
uninstallFailed metricReason = "uninstall_failed" | |
) | |
func updateMetrics(plugin *greenhousev1alpha1.Plugin, result metricResult, reason metricReason) { | |
pluginReconcileTotalLabels := prometheus.Labels{ | |
"pluginDefinition": plugin.Spec.PluginDefinition, | |
"clusterName": plugin.Spec.ClusterName, | |
"plugin": plugin.Name, | |
"organization": plugin.Namespace, | |
"result": result, | |
"reason": reason, | |
} | |
pluginReconcileTotal.With(pluginReconcileTotalLabels).Inc() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
88770d2
to
d91bd71
Compare
989be64
to
c9e05e2
Compare
Description
I added additional metric for Helm Controller to detect success or error for helm reconciliation.
What type of PR is this? (check all applicable)
Related Tickets & Documents
Added tests?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Added to documentation?
Checklist