feat: extra OPTEL collector configuration. #581

jvanz · 2024-11-05T17:34:43Z

Description

Adds additional telemetry configuration fields to allow users to add their custom OpenTelemetry collector configuration together with the Kubewarden configuration.

Fix #573

jvanz · 2024-11-06T13:58:15Z

@kubewarden/kubewarden-developers I'm adding this in the Pending review column as draft to give you the oportunity to sneak and peak and give initial feedback while I'm testing this with a Stackstack cluster.

flavio

I think this looks good, there's still the chance of someone re-defining (and I guess overload?) our configuration by specifying the same receiver/processor/... There's nothing we can do about that.

I just wonder if we shouldn't just give up and just let the user overwrite everything with its values. That would simplify the helm chart (get rid of all the if .Values.<> that check for non empty values).

Finally, I love the unit tests you added 👏 ❤️

jvanz · 2024-11-06T19:30:28Z

I think this looks good, there's still the chance of someone re-defining (and I guess overload?) our configuration by specifying the same receiver/processor/... There's nothing we can do about that.

Yes, to mitigate this issue I've added a schema validation file in the chart. Therefore, helm chart will block the usage of the same names.

I just wonder if we shouldn't just give up and just let the user overwrite everything with its values. That would simplify the helm chart (get rid of all the if .Values.<> that check for non empty values).

I've considered that first. But then I've remember that our components expect the collector to be a sidecar. Therefore, considering that we are not thinking on change that for now, if we allow the free form approach we can break out environment because the user change the deployment mode intentionally or not. I understand that we can document this. But I think even documenting that, we will get issues from users complaining that "telemetry is broken". But this is open to discussion. :)

Finally, I love the unit tests you added 👏 ❤️

Thanks! I've really liked the tests as well.

fabriziosestito · 2024-11-07T05:51:34Z

@jvanz I like the approach (and the tests).

Just one comment: the right short name for OpenTelemetry is otel, not optel, so we might want to change the names of the values accordingly.

viccuad · 2024-11-11T12:45:07Z

I also like it, I think is understandable. Maybe a commented line saying that the extra* values have precedence over the other collector.

Another option to consider would be to provide an "enum switch" like we and other Rancher charts used to do for cert-manager (example), if we foresee that this is going to be complicated and we can provide some defaults. E.g: manual-config-with-extra, stackstate-config, etc.

jvanz · 2024-11-14T02:37:57Z

I also like it, I think is understandable. Maybe a commented line saying that the extra* values have precedence over the other collector.

Actually, the extra configuration does not have precedence. It will live together with the previous configuration. In other words, the old otel pipeline remains the same. Nothing changes. But if the user define an extra configuration it will be merged into the otel collector configuration as well.

jvanz · 2024-11-14T02:44:04Z

@kubewarden/kubewarden-developers using this PR with the following values file, I'm able to have a minimum configuration to keep the original Kubewarden OTEL pipelines and add another to send data to Stackstate:

telemetry:
  metrics:
    enabled: True
    port: 8080
  tracing:
    enabled: True
    jaeger:
      endpoint: "my-open-telemetry-collector.jaeger.svc.cluster.local:4317"
      tls:
        insecure: true
  extraOtelConfig:
    envFrom:
      - secretRef:
          name: open-telemetry-collector
  extraConfig: 
    processors:
      resource:
        attributes:
        - key: k8s.cluster.name
          action: upsert
          value: k3d-kubewarden
        - key: service.instance.id
          from_attribute: k8s.pod.uid
          action: insert
    extensions:
      bearertokenauth:
        scheme: SUSEObservability
        token: "${env:API_KEY}"
    exporters:
      debug:
        verbosity: normal
      otlphttp/stackstate:
        auth:
          authenticator: bearertokenauth
        endpoint: https://otlp-stackstate.oldfield.arch.nue2.suse.org:443
        tls:
          insecure_skip_verify: true
    service:
      extensions:
        - bearertokenauth
      pipelines:
        traces/stackstate:
          receivers: [otlp]
          processors: [resource]
          exporters: [otlphttp/stackstate]
        metrics/stackstate:
          receivers: [otlp]
          processors: [resource]
          exporters: [debug, otlphttp/stackstate]

This is the result OTEL collector:

---
# Source: kubewarden-controller/templates/opentelemetry-collector.yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: kubewarden
  namespace: default
  labels:
    helm.sh/chart: kubewarden-controller-3.1.0
    app.kubernetes.io/name: kubewarden-controller
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: "v1.18.0"
    app.kubernetes.io/component: controller
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kubewarden
  annotations:
    
spec:
  mode: sidecar
  envFrom:
    - secretRef:
        name: open-telemetry-collector
  config:
    extensions:
      bearertokenauth:
        scheme: SUSEObservability
        token: ${env:API_KEY}
    receivers:
      otlp:
        protocols:
          grpc: {}
    processors:
      batch: {}
      resource:
        attributes:
        - action: upsert
          key: k8s.cluster.name
          value: k3d-kubewarden
        - action: insert
          from_attribute: k8s.pod.uid
          key: service.instance.id
    exporters:
      otlp/jaeger:
        endpoint: my-open-telemetry-collector.jaeger.svc.cluster.local:4317
        tls:
          insecure: true
      prometheus:
        endpoint: ":8080"
      
      debug:
        verbosity: normal
      otlphttp/stackstate:
        auth:
          authenticator: bearertokenauth
        endpoint: https://otlp-stackstate.oldfield.arch.nue2.suse.org:443
        tls:
          insecure_skip_verify: true
    service:
      extensions:
        - bearertokenauth
      pipelines:
        metrics/stackstate:
          exporters:
          - debug
          - otlphttp/stackstate
          processors:
          - resource
          receivers:
          - otlp
        traces/stackstate:
          exporters:
          - otlphttp/stackstate
          processors:
          - resource
          receivers:
          - otlp
        metrics:
          receivers: [otlp]
          processors: []
          exporters: [prometheus]
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlp/jaeger]

jvanz · 2024-11-14T02:51:04Z

@kubewarden/kubewarden-developers , during the tests I've figured out that our current approach to deploy OTEL collector using sidecar, prevents us from using the Kubernetes Attributes processor. Which pretty important to fetch metadata from the cluster and allow the association of the metrics/traces with the resources running in the cluster:

The Kubernetes Attributes Processor is one of the most important components for a collector running in Kubernetes. Any collector receiving application data should use it. Because it adds Kubernetes context to your telemetry, the Kubernetes Attributes Processor lets you correlate your application’s traces, metrics, and logs signals with your Kubernetes telemetry, such as pod metrics and traces.

This is configuration required for Stackstate as show in the documentation example. Therefore, I think we need a second iteration on this to improve our OTEL deployment allowing us to enable this processor available only in DaemonSet and Deployment deployments mode. This may require changes in policy server and Kubewarden controller because these components always consider the collector accessible in the same pod.

jvanz · 2024-11-14T22:33:59Z

@kubewarden/kubewarden-developers as we've discussed during a call, I've updated this PR to move the OTEL configuration from the template to the values file. I've also update the tests to reflect that and start a new test to cover the controller deployment to test a change there as well.

I'm working in the end-to-end tests repository to make it work with the latest changes as well.

flavio

Let's discuss more about that after hackweek.

Also, let's not forget the Rancher UI has a configuration helper for metrics and tracing. Merging this PR would break it.

flavio · 2024-11-15T14:43:38Z

charts/kubewarden-controller/templates/deployment.yaml

@@ -18,7 +18,7 @@ spec:
        {{- range keys .Values.podAnnotations }}
        {{ . | quote }}: {{ get $.Values.podAnnotations . | quote}}
        {{- end }}
-        {{- if or .Values.telemetry.metrics.enabled .Values.telemetry.tracing.enabled}}
+        {{- if or .Values.telemetry.metrics.enabled .Values.telemetry.tracing.enabled }}


We have to add a new boolean flag inside of the helm chart values file that determines if he have to trigger otel's sidecar injection.

If a user has already otel deployed using a daemonset he will not want to have the sidecar injected.

flavio · 2024-11-15T14:45:11Z

charts/kubewarden-controller/templates/deployment.yaml

        env:
-          - name: KUBEWARDEN_POLICY_SERVER_SERVICES_METRICS_PORT
-            value: "{{ .Values.telemetry.metrics.port | default 8080 }}"
+        {{- toYaml .Values.env | nindent 10 }}


TODO: the kubewarden controller is in charge of creating the Deployment resource of the Policy Server. We have to extend the controller to have a new flag/env variable that determines whether the sidecar annotation has to be added to the Policy Server Deployment.

This env variable must then be exposed here

flavio · 2024-11-15T14:55:08Z

charts/kubewarden-controller/templates/opentelemetry-collector.yaml

-          processors: [batch]
-          exporters: [otlp/jaeger]
-        {{- end }}
+  {{- toYaml .Values.telemetry.otelSpec | nindent 2 }}


@jvanz, please don't kill me... but today, while looking into a failure of the e2e tests against this PR we realized everything is a bit more complicated.

Assume this PR is merged. Inside of the values.yaml file we have what used to be the previous configuration of the collector. As a user I can do .Values.telemetry.enabled = true (we still have this key) and set .Values.telemetry.tracing.enabled = false. In this scenario the tracing configuration is still under .Values.telemetry.otelSpec unless the user changes it. As a result, the collector would be configured to enable metrics, but it would also have the tracing configuration provided by the default values (which would constantly complain while attempting to send the traces to a non-existing service).

@kravciak, @fabriziosestito and I discussed what to do. Our idea, which we can discuss further with the other maintainers, is to support two scenarios:

The user enables otel integration with the sidecar model. We take the old code, add a new boolean flag inside of the Values.yaml and add some new if statements here.

The user disables the otel integration done with the sidecar and creates on his own otel Collector configuration (maybe create the actual object on his own, without this helm chart?).

We have to do some brainstorming about that...

I understand and agree if the finding. By the way, thanks for that.

But I'm working on the requested changes and wondering if make sense to merge this now. Considering that we need to change the controller to allow Kubewarden to send data to Otel collectors not running in the side car, does make sense to change the Helm chart now? Our stack only support Otel sidecar now. If we add a flag to disable the sidecar, the telemetry will not work at all. Therefore, any possible customization in Otel collector will not be useful. I think we can park this change, change the controller to allow other Otel deployment models and than came back here an do all the changes at once. Otherwise, if I've apply the request changes now, we will add a flag which will disable sidecar and telemetry will not work. Which is equivalent to what we can do right now by disabling tracing and metrics in the current values structure.

@kubewarden/kubewarden-developers what do you think?

Trying to understand the whole thing.

Ideally I would like to offer the possibility of, via a values key that acts as an enum:

otel sidecar (current approach).

an otel collector configuration managed by the chart, that is compatible with stackstate.

an otel collector configuration inputed by the user, through the chart.

I agree with working on the controller first to allow to send data to otel collectors not on the sidecar, and park this PR.

I think we should park this PR, define the whole strategy and then push everything out at the same time.

@jvanz do you want to write a proposal? Maybe this is worth a really lightweight RFC

Sorry, I think we can still work on this PR. I was over thinking it. The goal here is just to add more flexibility in the Otel collector configuration. Not change the deployment mode. Therefore, I've updated the PR changing it to support the old way of configuration.

If you think we can merge it, fine. Otherwise, we can park it until we fix the controller.

I think we should park it as well, I have doubts about the user specifying the spec of the OpenTelemetryCollector by using values.

I would prefer to have either our sidecar "sane defaults" configuration on or entirely off and let the user deal with creating the OpenTelemetryCollector independently from our helm chart.

Adds additional telemetry configuration fields to allow users to add their custom OpenTelemetry collector configuration together with the Kubewarden configuration. Signed-off-by: José Guilherme Vanz <[email protected]>

jvanz self-assigned this Nov 5, 2024

jvanz force-pushed the optel-config branch from fc81c97 to 4fa5839 Compare November 6, 2024 13:53

flavio reviewed Nov 6, 2024

View reviewed changes

jvanz force-pushed the optel-config branch from 4fa5839 to 543123e Compare November 6, 2024 19:22

jvanz requested a review from flavio November 6, 2024 19:30

jvanz force-pushed the optel-config branch from 543123e to f4f042a Compare November 14, 2024 01:57

jvanz force-pushed the optel-config branch from f4f042a to 1a8dc6e Compare November 14, 2024 02:39

jvanz mentioned this pull request Nov 14, 2024

Allow others OTEL deployment patterns #584

Open

jvanz marked this pull request as ready for review November 14, 2024 11:55

jvanz requested a review from a team as a code owner November 14, 2024 11:55

jvanz force-pushed the optel-config branch from 1a8dc6e to 9498c6b Compare November 14, 2024 21:55

flavio requested changes Nov 15, 2024

View reviewed changes

feat: extra OPTEL collector configuration.

039d333

Adds additional telemetry configuration fields to allow users to add their custom OpenTelemetry collector configuration together with the Kubewarden configuration. Signed-off-by: José Guilherme Vanz <[email protected]>

jvanz force-pushed the optel-config branch from 9498c6b to 039d333 Compare November 27, 2024 19:55

jvanz requested review from flavio and viccuad November 27, 2024 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extra OPTEL collector configuration. #581

feat: extra OPTEL collector configuration. #581

jvanz commented Nov 5, 2024

jvanz commented Nov 6, 2024

flavio left a comment

jvanz commented Nov 6, 2024

fabriziosestito commented Nov 7, 2024 •

edited

Loading

viccuad commented Nov 11, 2024

jvanz commented Nov 14, 2024

jvanz commented Nov 14, 2024 •

edited

Loading

jvanz commented Nov 14, 2024 •

edited

Loading

jvanz commented Nov 14, 2024 •

edited

Loading

flavio left a comment

flavio Nov 15, 2024

flavio Nov 15, 2024

flavio Nov 15, 2024

jvanz Nov 27, 2024

viccuad Nov 27, 2024

flavio Nov 27, 2024

jvanz Nov 27, 2024 •

edited

Loading

fabriziosestito Nov 28, 2024

feat: extra OPTEL collector configuration. #581

Are you sure you want to change the base?

feat: extra OPTEL collector configuration. #581

Conversation

jvanz commented Nov 5, 2024

Description

jvanz commented Nov 6, 2024

flavio left a comment

Choose a reason for hiding this comment

jvanz commented Nov 6, 2024

fabriziosestito commented Nov 7, 2024 • edited Loading

viccuad commented Nov 11, 2024

jvanz commented Nov 14, 2024

jvanz commented Nov 14, 2024 • edited Loading

jvanz commented Nov 14, 2024 • edited Loading

jvanz commented Nov 14, 2024 • edited Loading

flavio left a comment

Choose a reason for hiding this comment

flavio Nov 15, 2024

Choose a reason for hiding this comment

flavio Nov 15, 2024

Choose a reason for hiding this comment

flavio Nov 15, 2024

Choose a reason for hiding this comment

jvanz Nov 27, 2024

Choose a reason for hiding this comment

viccuad Nov 27, 2024

Choose a reason for hiding this comment

flavio Nov 27, 2024

Choose a reason for hiding this comment

jvanz Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

fabriziosestito Nov 28, 2024

Choose a reason for hiding this comment

fabriziosestito commented Nov 7, 2024 •

edited

Loading

jvanz commented Nov 14, 2024 •

edited

Loading

jvanz commented Nov 14, 2024 •

edited

Loading

jvanz commented Nov 14, 2024 •

edited

Loading

jvanz Nov 27, 2024 •

edited

Loading