-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extra OPTEL collector configuration. #581
base: main
Are you sure you want to change the base?
Conversation
@kubewarden/kubewarden-developers I'm adding this in the Pending review column as draft to give you the oportunity to sneak and peak and give initial feedback while I'm testing this with a Stackstack cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, there's still the chance of someone re-defining (and I guess overload?) our configuration by specifying the same receiver/processor/... There's nothing we can do about that.
I just wonder if we shouldn't just give up and just let the user overwrite everything with its values. That would simplify the helm chart (get rid of all the if .Values.<>
that check for non empty values).
Finally, I love the unit tests you added 👏 ❤️
Yes, to mitigate this issue I've added a schema validation file in the chart. Therefore, helm chart will block the usage of the same names.
I've considered that first. But then I've remember that our components expect the collector to be a sidecar. Therefore, considering that we are not thinking on change that for now, if we allow the free form approach we can break out environment because the user change the deployment mode intentionally or not. I understand that we can document this. But I think even documenting that, we will get issues from users complaining that "telemetry is broken". But this is open to discussion. :)
Thanks! I've really liked the tests as well. |
@jvanz I like the approach (and the tests). Just one comment: the right short name for OpenTelemetry is |
I also like it, I think is understandable. Maybe a commented line saying that the Another option to consider would be to provide an "enum switch" like we and other Rancher charts used to do for cert-manager (example), if we foresee that this is going to be complicated and we can provide some defaults. E.g: |
Actually, the extra configuration does not have precedence. It will live together with the previous configuration. In other words, the old otel pipeline remains the same. Nothing changes. But if the user define an extra configuration it will be merged into the otel collector configuration as well. |
@kubewarden/kubewarden-developers using this PR with the following values file, I'm able to have a minimum configuration to keep the original Kubewarden OTEL pipelines and add another to send data to Stackstate: telemetry:
metrics:
enabled: True
port: 8080
tracing:
enabled: True
jaeger:
endpoint: "my-open-telemetry-collector.jaeger.svc.cluster.local:4317"
tls:
insecure: true
extraOtelConfig:
envFrom:
- secretRef:
name: open-telemetry-collector
extraConfig:
processors:
resource:
attributes:
- key: k8s.cluster.name
action: upsert
value: k3d-kubewarden
- key: service.instance.id
from_attribute: k8s.pod.uid
action: insert
extensions:
bearertokenauth:
scheme: SUSEObservability
token: "${env:API_KEY}"
exporters:
debug:
verbosity: normal
otlphttp/stackstate:
auth:
authenticator: bearertokenauth
endpoint: https://otlp-stackstate.oldfield.arch.nue2.suse.org:443
tls:
insecure_skip_verify: true
service:
extensions:
- bearertokenauth
pipelines:
traces/stackstate:
receivers: [otlp]
processors: [resource]
exporters: [otlphttp/stackstate]
metrics/stackstate:
receivers: [otlp]
processors: [resource]
exporters: [debug, otlphttp/stackstate] This is the result OTEL collector: ---
# Source: kubewarden-controller/templates/opentelemetry-collector.yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: kubewarden
namespace: default
labels:
helm.sh/chart: kubewarden-controller-3.1.0
app.kubernetes.io/name: kubewarden-controller
app.kubernetes.io/instance: release-name
app.kubernetes.io/version: "v1.18.0"
app.kubernetes.io/component: controller
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: kubewarden
annotations:
spec:
mode: sidecar
envFrom:
- secretRef:
name: open-telemetry-collector
config:
extensions:
bearertokenauth:
scheme: SUSEObservability
token: ${env:API_KEY}
receivers:
otlp:
protocols:
grpc: {}
processors:
batch: {}
resource:
attributes:
- action: upsert
key: k8s.cluster.name
value: k3d-kubewarden
- action: insert
from_attribute: k8s.pod.uid
key: service.instance.id
exporters:
otlp/jaeger:
endpoint: my-open-telemetry-collector.jaeger.svc.cluster.local:4317
tls:
insecure: true
prometheus:
endpoint: ":8080"
debug:
verbosity: normal
otlphttp/stackstate:
auth:
authenticator: bearertokenauth
endpoint: https://otlp-stackstate.oldfield.arch.nue2.suse.org:443
tls:
insecure_skip_verify: true
service:
extensions:
- bearertokenauth
pipelines:
metrics/stackstate:
exporters:
- debug
- otlphttp/stackstate
processors:
- resource
receivers:
- otlp
traces/stackstate:
exporters:
- otlphttp/stackstate
processors:
- resource
receivers:
- otlp
metrics:
receivers: [otlp]
processors: []
exporters: [prometheus]
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger] |
@kubewarden/kubewarden-developers , during the tests I've figured out that our current approach to deploy OTEL collector using sidecar, prevents us from using the Kubernetes Attributes processor. Which pretty important to fetch metadata from the cluster and allow the association of the metrics/traces with the resources running in the cluster:
This is configuration required for Stackstate as show in the documentation example. Therefore, I think we need a second iteration on this to improve our OTEL deployment allowing us to enable this processor available only in |
@kubewarden/kubewarden-developers as we've discussed during a call, I've updated this PR to move the OTEL configuration from the template to the values file. I've also update the tests to reflect that and start a new test to cover the controller deployment to test a change there as well. I'm working in the end-to-end tests repository to make it work with the latest changes as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss more about that after hackweek.
Also, let's not forget the Rancher UI has a configuration helper for metrics and tracing. Merging this PR would break it.
@@ -18,7 +18,7 @@ spec: | |||
{{- range keys .Values.podAnnotations }} | |||
{{ . | quote }}: {{ get $.Values.podAnnotations . | quote}} | |||
{{- end }} | |||
{{- if or .Values.telemetry.metrics.enabled .Values.telemetry.tracing.enabled}} | |||
{{- if or .Values.telemetry.metrics.enabled .Values.telemetry.tracing.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to add a new boolean flag inside of the helm chart values file that determines if he have to trigger otel's sidecar injection.
If a user has already otel deployed using a daemonset he will not want to have the sidecar injected.
env: | ||
- name: KUBEWARDEN_POLICY_SERVER_SERVICES_METRICS_PORT | ||
value: "{{ .Values.telemetry.metrics.port | default 8080 }}" | ||
{{- toYaml .Values.env | nindent 10 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: the kubewarden controller is in charge of creating the Deployment resource of the Policy Server. We have to extend the controller to have a new flag/env variable that determines whether the sidecar annotation has to be added to the Policy Server Deployment.
This env variable must then be exposed here
processors: [batch] | ||
exporters: [otlp/jaeger] | ||
{{- end }} | ||
{{- toYaml .Values.telemetry.otelSpec | nindent 2 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvanz, please don't kill me... but today, while looking into a failure of the e2e tests against this PR we realized everything is a bit more complicated.
Assume this PR is merged. Inside of the values.yaml
file we have what used to be the previous configuration of the collector. As a user I can do .Values.telemetry.enabled = true
(we still have this key) and set .Values.telemetry.tracing.enabled = false
. In this scenario the tracing configuration is still under .Values.telemetry.otelSpec
unless the user changes it. As a result, the collector would be configured to enable metrics, but it would also have the tracing configuration provided by the default values (which would constantly complain while attempting to send the traces to a non-existing service).
@kravciak, @fabriziosestito and I discussed what to do. Our idea, which we can discuss further with the other maintainers, is to support two scenarios:
- The user enables otel integration with the sidecar model. We take the old code, add a new boolean flag inside of the
Values.yaml
and add some newif
statements here. - The user disables the otel integration done with the sidecar and creates on his own otel Collector configuration (maybe create the actual object on his own, without this helm chart?).
We have to do some brainstorming about that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand and agree if the finding. By the way, thanks for that.
But I'm working on the requested changes and wondering if make sense to merge this now. Considering that we need to change the controller to allow Kubewarden to send data to Otel collectors not running in the side car, does make sense to change the Helm chart now? Our stack only support Otel sidecar now. If we add a flag to disable the sidecar, the telemetry will not work at all. Therefore, any possible customization in Otel collector will not be useful. I think we can park this change, change the controller to allow other Otel deployment models and than came back here an do all the changes at once. Otherwise, if I've apply the request changes now, we will add a flag which will disable sidecar and telemetry will not work. Which is equivalent to what we can do right now by disabling tracing and metrics in the current values structure.
@kubewarden/kubewarden-developers what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand the whole thing.
Ideally I would like to offer the possibility of, via a values key that acts as an enum:
- otel sidecar (current approach).
- an otel collector configuration managed by the chart, that is compatible with stackstate.
- an otel collector configuration inputed by the user, through the chart.
I agree with working on the controller first to allow to send data to otel collectors not on the sidecar, and park this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should park this PR, define the whole strategy and then push everything out at the same time.
@jvanz do you want to write a proposal? Maybe this is worth a really lightweight RFC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think we can still work on this PR. I was over thinking it. The goal here is just to add more flexibility in the Otel collector configuration. Not change the deployment mode. Therefore, I've updated the PR changing it to support the old way of configuration.
If you think we can merge it, fine. Otherwise, we can park it until we fix the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should park it as well, I have doubts about the user specifying the spec of the OpenTelemetryCollector by using values.
I would prefer to have either our sidecar "sane defaults" configuration on or entirely off and let the user deal with creating the OpenTelemetryCollector independently from our helm chart.
Adds additional telemetry configuration fields to allow users to add their custom OpenTelemetry collector configuration together with the Kubewarden configuration. Signed-off-by: José Guilherme Vanz <[email protected]>
Description
Adds additional telemetry configuration fields to allow users to add their custom OpenTelemetry collector configuration together with the Kubewarden configuration.
Fix #573