Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extra OPTEL collector configuration. #581

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jvanz
Copy link
Member

@jvanz jvanz commented Nov 5, 2024

Description

Adds additional telemetry configuration fields to allow users to add their custom OpenTelemetry collector configuration together with the Kubewarden configuration.

Fix #573

@jvanz
Copy link
Member Author

jvanz commented Nov 6, 2024

@kubewarden/kubewarden-developers I'm adding this in the Pending review column as draft to give you the oportunity to sneak and peak and give initial feedback while I'm testing this with a Stackstack cluster.

Copy link
Member

@flavio flavio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, there's still the chance of someone re-defining (and I guess overload?) our configuration by specifying the same receiver/processor/... There's nothing we can do about that.

I just wonder if we shouldn't just give up and just let the user overwrite everything with its values. That would simplify the helm chart (get rid of all the if .Values.<> that check for non empty values).

Finally, I love the unit tests you added 👏 ❤️

@jvanz
Copy link
Member Author

jvanz commented Nov 6, 2024

I think this looks good, there's still the chance of someone re-defining (and I guess overload?) our configuration by specifying the same receiver/processor/... There's nothing we can do about that.

Yes, to mitigate this issue I've added a schema validation file in the chart. Therefore, helm chart will block the usage of the same names.

I just wonder if we shouldn't just give up and just let the user overwrite everything with its values. That would simplify the helm chart (get rid of all the if .Values.<> that check for non empty values).

I've considered that first. But then I've remember that our components expect the collector to be a sidecar. Therefore, considering that we are not thinking on change that for now, if we allow the free form approach we can break out environment because the user change the deployment mode intentionally or not. I understand that we can document this. But I think even documenting that, we will get issues from users complaining that "telemetry is broken". But this is open to discussion. :)

Finally, I love the unit tests you added 👏 ❤️

Thanks! I've really liked the tests as well.

@jvanz jvanz requested a review from flavio November 6, 2024 19:30
@fabriziosestito
Copy link
Contributor

fabriziosestito commented Nov 7, 2024

@jvanz I like the approach (and the tests).

Just one comment: the right short name for OpenTelemetry is otel, not optel, so we might want to change the names of the values accordingly.

@viccuad
Copy link
Member

viccuad commented Nov 11, 2024

I also like it, I think is understandable. Maybe a commented line saying that the extra* values have precedence over the other collector.

Another option to consider would be to provide an "enum switch" like we and other Rancher charts used to do for cert-manager (example), if we foresee that this is going to be complicated and we can provide some defaults. E.g: manual-config-with-extra, stackstate-config, etc.

@jvanz
Copy link
Member Author

jvanz commented Nov 14, 2024

I also like it, I think is understandable. Maybe a commented line saying that the extra* values have precedence over the other collector.

Actually, the extra configuration does not have precedence. It will live together with the previous configuration. In other words, the old otel pipeline remains the same. Nothing changes. But if the user define an extra configuration it will be merged into the otel collector configuration as well.

@jvanz
Copy link
Member Author

jvanz commented Nov 14, 2024

@kubewarden/kubewarden-developers using this PR with the following values file, I'm able to have a minimum configuration to keep the original Kubewarden OTEL pipelines and add another to send data to Stackstate:

telemetry:
  metrics:
    enabled: True
    port: 8080
  tracing:
    enabled: True
    jaeger:
      endpoint: "my-open-telemetry-collector.jaeger.svc.cluster.local:4317"
      tls:
        insecure: true
  extraOtelConfig:
    envFrom:
      - secretRef:
          name: open-telemetry-collector
  extraConfig: 
    processors:
      resource:
        attributes:
        - key: k8s.cluster.name
          action: upsert
          value: k3d-kubewarden
        - key: service.instance.id
          from_attribute: k8s.pod.uid
          action: insert
    extensions:
      bearertokenauth:
        scheme: SUSEObservability
        token: "${env:API_KEY}"
    exporters:
      debug:
        verbosity: normal
      otlphttp/stackstate:
        auth:
          authenticator: bearertokenauth
        endpoint: https://otlp-stackstate.oldfield.arch.nue2.suse.org:443
        tls:
          insecure_skip_verify: true
    service:
      extensions:
        - bearertokenauth
      pipelines:
        traces/stackstate:
          receivers: [otlp]
          processors: [resource]
          exporters: [otlphttp/stackstate]
        metrics/stackstate:
          receivers: [otlp]
          processors: [resource]
          exporters: [debug, otlphttp/stackstate]

This is the result OTEL collector:

---
# Source: kubewarden-controller/templates/opentelemetry-collector.yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: kubewarden
  namespace: default
  labels:
    helm.sh/chart: kubewarden-controller-3.1.0
    app.kubernetes.io/name: kubewarden-controller
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: "v1.18.0"
    app.kubernetes.io/component: controller
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kubewarden
  annotations:
    
spec:
  mode: sidecar
  envFrom:
    - secretRef:
        name: open-telemetry-collector
  config:
    extensions:
      bearertokenauth:
        scheme: SUSEObservability
        token: ${env:API_KEY}
    receivers:
      otlp:
        protocols:
          grpc: {}
    processors:
      batch: {}
      resource:
        attributes:
        - action: upsert
          key: k8s.cluster.name
          value: k3d-kubewarden
        - action: insert
          from_attribute: k8s.pod.uid
          key: service.instance.id
    exporters:
      otlp/jaeger:
        endpoint: my-open-telemetry-collector.jaeger.svc.cluster.local:4317
        tls:
          insecure: true
      prometheus:
        endpoint: ":8080"
      
      debug:
        verbosity: normal
      otlphttp/stackstate:
        auth:
          authenticator: bearertokenauth
        endpoint: https://otlp-stackstate.oldfield.arch.nue2.suse.org:443
        tls:
          insecure_skip_verify: true
    service:
      extensions:
        - bearertokenauth
      pipelines:
        metrics/stackstate:
          exporters:
          - debug
          - otlphttp/stackstate
          processors:
          - resource
          receivers:
          - otlp
        traces/stackstate:
          exporters:
          - otlphttp/stackstate
          processors:
          - resource
          receivers:
          - otlp
        metrics:
          receivers: [otlp]
          processors: []
          exporters: [prometheus]
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlp/jaeger]

@jvanz
Copy link
Member Author

jvanz commented Nov 14, 2024

@kubewarden/kubewarden-developers , during the tests I've figured out that our current approach to deploy OTEL collector using sidecar, prevents us from using the Kubernetes Attributes processor. Which pretty important to fetch metadata from the cluster and allow the association of the metrics/traces with the resources running in the cluster:

The Kubernetes Attributes Processor is one of the most important components for a collector running in Kubernetes. Any collector receiving application data should use it. Because it adds Kubernetes context to your telemetry, the Kubernetes Attributes Processor lets you correlate your application’s traces, metrics, and logs signals with your Kubernetes telemetry, such as pod metrics and traces.

This is configuration required for Stackstate as show in the documentation example. Therefore, I think we need a second iteration on this to improve our OTEL deployment allowing us to enable this processor available only in DaemonSet and Deployment deployments mode. This may require changes in policy server and Kubewarden controller because these components always consider the collector accessible in the same pod.

@jvanz jvanz marked this pull request as ready for review November 14, 2024 11:55
@jvanz jvanz requested a review from a team as a code owner November 14, 2024 11:55
@jvanz
Copy link
Member Author

jvanz commented Nov 14, 2024

@kubewarden/kubewarden-developers as we've discussed during a call, I've updated this PR to move the OTEL configuration from the template to the values file. I've also update the tests to reflect that and start a new test to cover the controller deployment to test a change there as well.

I'm working in the end-to-end tests repository to make it work with the latest changes as well.

Copy link
Member

@flavio flavio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss more about that after hackweek.

Also, let's not forget the Rancher UI has a configuration helper for metrics and tracing. Merging this PR would break it.

@@ -18,7 +18,7 @@ spec:
{{- range keys .Values.podAnnotations }}
{{ . | quote }}: {{ get $.Values.podAnnotations . | quote}}
{{- end }}
{{- if or .Values.telemetry.metrics.enabled .Values.telemetry.tracing.enabled}}
{{- if or .Values.telemetry.metrics.enabled .Values.telemetry.tracing.enabled }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to add a new boolean flag inside of the helm chart values file that determines if he have to trigger otel's sidecar injection.

If a user has already otel deployed using a daemonset he will not want to have the sidecar injected.

env:
- name: KUBEWARDEN_POLICY_SERVER_SERVICES_METRICS_PORT
value: "{{ .Values.telemetry.metrics.port | default 8080 }}"
{{- toYaml .Values.env | nindent 10 }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: the kubewarden controller is in charge of creating the Deployment resource of the Policy Server. We have to extend the controller to have a new flag/env variable that determines whether the sidecar annotation has to be added to the Policy Server Deployment.

This env variable must then be exposed here

processors: [batch]
exporters: [otlp/jaeger]
{{- end }}
{{- toYaml .Values.telemetry.otelSpec | nindent 2 }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvanz, please don't kill me... but today, while looking into a failure of the e2e tests against this PR we realized everything is a bit more complicated.

Assume this PR is merged. Inside of the values.yaml file we have what used to be the previous configuration of the collector. As a user I can do .Values.telemetry.enabled = true (we still have this key) and set .Values.telemetry.tracing.enabled = false. In this scenario the tracing configuration is still under .Values.telemetry.otelSpec unless the user changes it. As a result, the collector would be configured to enable metrics, but it would also have the tracing configuration provided by the default values (which would constantly complain while attempting to send the traces to a non-existing service).

@kravciak, @fabriziosestito and I discussed what to do. Our idea, which we can discuss further with the other maintainers, is to support two scenarios:

  • The user enables otel integration with the sidecar model. We take the old code, add a new boolean flag inside of the Values.yaml and add some new if statements here.
  • The user disables the otel integration done with the sidecar and creates on his own otel Collector configuration (maybe create the actual object on his own, without this helm chart?).

We have to do some brainstorming about that...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and agree if the finding. By the way, thanks for that.

But I'm working on the requested changes and wondering if make sense to merge this now. Considering that we need to change the controller to allow Kubewarden to send data to Otel collectors not running in the side car, does make sense to change the Helm chart now? Our stack only support Otel sidecar now. If we add a flag to disable the sidecar, the telemetry will not work at all. Therefore, any possible customization in Otel collector will not be useful. I think we can park this change, change the controller to allow other Otel deployment models and than came back here an do all the changes at once. Otherwise, if I've apply the request changes now, we will add a flag which will disable sidecar and telemetry will not work. Which is equivalent to what we can do right now by disabling tracing and metrics in the current values structure.

@kubewarden/kubewarden-developers what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand the whole thing.

Ideally I would like to offer the possibility of, via a values key that acts as an enum:

  • otel sidecar (current approach).
  • an otel collector configuration managed by the chart, that is compatible with stackstate.
  • an otel collector configuration inputed by the user, through the chart.

I agree with working on the controller first to allow to send data to otel collectors not on the sidecar, and park this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should park this PR, define the whole strategy and then push everything out at the same time.

@jvanz do you want to write a proposal? Maybe this is worth a really lightweight RFC

Copy link
Member Author

@jvanz jvanz Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think we can still work on this PR. I was over thinking it. The goal here is just to add more flexibility in the Otel collector configuration. Not change the deployment mode. Therefore, I've updated the PR changing it to support the old way of configuration.

If you think we can merge it, fine. Otherwise, we can park it until we fix the controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should park it as well, I have doubts about the user specifying the spec of the OpenTelemetryCollector by using values.

I would prefer to have either our sidecar "sane defaults" configuration on or entirely off and let the user deal with creating the OpenTelemetryCollector independently from our helm chart.

Adds additional telemetry configuration fields to allow users to add
their custom OpenTelemetry collector configuration together with the
Kubewarden configuration.

Signed-off-by: José Guilherme Vanz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow more customization in the OpenTelemetry collector configuration
4 participants