Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(plugins): additional helm controller metrics #770

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gciezkowski-acc
Copy link
Contributor

@gciezkowski-acc gciezkowski-acc commented Nov 25, 2024

Description

I added additional metric for Helm Controller to detect success or error for helm reconciliation.

What type of PR is this? (check all applicable)

  • 🍕 Feature
  • 🐛 Bug Fix
  • 📝 Documentation Update
  • 🎨 Style
  • 🧑‍💻 Code Refactor
  • 🔥 Performance Improvements
  • ✅ Test
  • 🤖 Build
  • 🔁 CI
  • 📦 Chore (Release)
  • ⏩ Revert

Related Tickets & Documents

Remove if not applicable

Added tests?

  • 👍 yes
  • 🙅 no, because they aren't needed
  • 🙋 no, because I need help
  • Separate ticket for tests # (issue/pr)

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Added to documentation?

  • 📜 README.md
  • 🤝 Documentation pages updated
  • 🙅 no documentation needed
  • (if applicable) generated OpenAPI docs for CRD changes

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (no documentation needed)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@gciezkowski-acc gciezkowski-acc changed the title feature(plugins): additional helm controller metrics feat(plugins): additional helm controller metrics Nov 25, 2024
@gciezkowski-acc gciezkowski-acc force-pushed the feat/439_additional_helm_controller_metrics branch from 49dcb61 to 7d4c08f Compare November 25, 2024 12:28
@gciezkowski-acc gciezkowski-acc marked this pull request as ready for review November 25, 2024 12:38
@gciezkowski-acc gciezkowski-acc requested a review from a team as a code owner November 25, 2024 12:38
@gciezkowski-acc gciezkowski-acc force-pushed the feat/439_additional_helm_controller_metrics branch from 7e1bf73 to a6c4de4 Compare November 25, 2024 13:47
Comment on lines 77 to 81
- alert: GreenhousePluginHasErrors
annotations:
summary: "Plugin has errors"
description: "Plugin {{ $labels.plugin }} in organization {{ $labels.organization }} has more then 10 errors"
expr: increase(greenhouse_plugin_reconcile_total{result="error"}[15m] > 0) by (plugin, organization)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- alert: GreenhousePluginHasErrors
annotations:
summary: "Plugin has errors"
description: "Plugin {{ $labels.plugin }} in organization {{ $labels.organization }} has more then 10 errors"
expr: increase(greenhouse_plugin_reconcile_total{result="error"}[15m] > 0) by (plugin, organization)
- alert: GreenhousePluginConstantlyFailing
annotations:
summary: "Plugin reconciliation is constantly failing"
description: "Plugin {{ $labels.plugin }} in organization {{ $labels.organization }} keeps failing with {{ $labels.reason}}"
expr: sum by (organization, plugin) (rate(greenhouse_plugin_reconcile_total{result="error"}[5m])) > 0
for: 15m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 49 to 55
case greenhousev1alpha1.HelmDriftDetectedCondition:
if condition.IsTrue() {
result = metricResultError
reason = metricReasonDiffFailed
}
case greenhousev1alpha1.StatusUpToDateCondition:
if condition.IsFalse() {
result = metricResultError
reason = metricReasonUpgradeFailed
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not reflecting errors in reconciling the helm release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 61 to 68

pluginReconcileTotalLabels := prometheus.Labels{
"pluginDefinition": plugin.Spec.PluginDefinition,
"clusterName": plugin.Spec.ClusterName,
"plugin": plugin.Name,
"organization": plugin.Namespace,
"result": result,
"reason": reason,
}
pluginReconcileTotal.With(pluginReconcileTotalLabels).Inc()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it is not possible to know why the HelmReconcileFailedCondition is set to true.
Could you instead increase the metrics whenever the condition is set to True, False in the code.

Suggested change
pluginReconcileTotalLabels := prometheus.Labels{
"pluginDefinition": plugin.Spec.PluginDefinition,
"clusterName": plugin.Spec.ClusterName,
"plugin": plugin.Name,
"organization": plugin.Namespace,
"result": result,
"reason": reason,
}
pluginReconcileTotal.With(pluginReconcileTotalLabels).Inc()
}
type(
metricResult string
metricReason string
)
const (
success metricResult = "success"
error metricResult = "error"
templateFailed metricReason = "template_failed"
diffFailed metricReason = "diff_failed"
upgradeFailed metricReason = "upgrade_failed"
uninstallFailed metricReason = "uninstall_failed"
)
func updateMetrics(plugin *greenhousev1alpha1.Plugin, result metricResult, reason metricReason) {
pluginReconcileTotalLabels := prometheus.Labels{
"pluginDefinition": plugin.Spec.PluginDefinition,
"clusterName": plugin.Spec.ClusterName,
"plugin": plugin.Name,
"organization": plugin.Namespace,
"result": result,
"reason": reason,
}
pluginReconcileTotal.With(pluginReconcileTotalLabels).Inc()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gciezkowski-acc gciezkowski-acc force-pushed the feat/439_additional_helm_controller_metrics branch from 88770d2 to d91bd71 Compare November 27, 2024 14:15
@gciezkowski-acc gciezkowski-acc force-pushed the feat/439_additional_helm_controller_metrics branch from 989be64 to c9e05e2 Compare November 28, 2024 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEAT] - additional HelmController metrics
2 participants