Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt benchmarks configuration to be in line with updated charts. #213

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

rodrigo-lourenco-lopes
Copy link
Contributor

@rodrigo-lourenco-lopes rodrigo-lourenco-lopes commented Nov 18, 2024

This is already a very messy PR which migrates these benchmarks to work with the latest changes in the helm charts.
I will create some follow-up issues to:

  • Test and change the core resources. Now that we have the entire application in one container these need to be tested and changed in order to keeep the same perfomance.
  • Add zeebe configuratitions to our helm values here, this can be done through core.configuration but this will overide the defaults in the config values in core.configmap in the camunda-platform-helm repo, see https://github.com/camunda/camunda-platform-helm/blob/75a1a164e9c7eb4cf021d1de4eee9b5ae39b6438/charts/camunda-platform-alpha/templates/core/configmap.yaml#L17-L21
  • (this was already an existing problem before the changes) The exporters seem to be laggin behind quite a lot and exporting at a slower pace than what we expecting them to do, for example capping at around 200 to 300 while processing is at 2000 in normal benchamrks. This has been observed in many weekly benchmarks, and needs to be investigated as it can possible be a regression.Due to some errors of benchmarks out of disk space in ES in some of the latest benchmarks we decided to double it for now.

Additionally this PR also addresses #214 fixing the tests related to assuring correct configurations with the golden files.

Closes: #214

Follow up issues:

@ChrisKujawa
Copy link
Member

@rodrigo-lourenco-lopes it is about disk space not memory :D

@rodrigo-lourenco-lopes rodrigo-lourenco-lopes changed the title Increase ES memory for the benchmarks. Increase ES disk size for normal benchmarks. Nov 18, 2024
@rodrigo-lourenco-lopes
Copy link
Contributor Author

@rodrigo-lourenco-lopes it is about disk space not memory :D

Ups, I misread the intention. Just corrected it.

Downloading camunda-platform from repo oci://ghcr.io/camunda/helm
Save error occurred: could not download oci://ghcr.io/camunda/helm/camunda-platform: ghcr.io/camunda/helm/camunda-platform:0.0.0-12.0.0-alpha1: not found
Error: could not download oci://ghcr.io/camunda/helm/camunda-platform: ghcr.io/camunda/helm/camunda-platform:0.0.0-12.0.0-alpha1: not found

@Zelldon also it seems that we are having issues pulling our chart dependency, have you seen this issue before?

@ChrisKujawa
Copy link
Member

@rodrigo-lourenco-lopes it is about disk space not memory :D

Ups, I misread the intention. Just corrected it.

Downloading camunda-platform from repo oci://ghcr.io/camunda/helm
Save error occurred: could not download oci://ghcr.io/camunda/helm/camunda-platform: ghcr.io/camunda/helm/camunda-platform:0.0.0-12.0.0-alpha1: not found
Error: could not download oci://ghcr.io/camunda/helm/camunda-platform: ghcr.io/camunda/helm/camunda-platform:0.0.0-12.0.0-alpha1: not found

@Zelldon also it seems that we are having issues pulling our chart dependency, have you seen this issue before?

I haven't seen this before 🤔

@rodrigo-lourenco-lopes
Copy link
Contributor Author

This commit also changed the templates and the configuration files, we will have to adapt the templates to this camunda/camunda-platform-helm@28d7927#diff-139e41c5d8eff029161d9db039ef8f0d46df8d6de4e91e42d7abb3c2103ba61a

@rodrigo-lourenco-lopes rodrigo-lourenco-lopes changed the title Increase ES disk size for normal benchmarks. Increase ES disk size, Fix tests and fetch correct Camunda version Nov 21, 2024
@ChrisKujawa
Copy link
Member

@rodrigo-lourenco-lopes can you try to deploy this into a namespace and check whether it comes up?

Copy link
Member

@ChrisKujawa ChrisKujawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rodrigo-lourenco-lopes I think two or three things we still need to validate

@@ -17,7 +17,7 @@ dependencies:
version: 6.4.0
condition: "prometheus-elasticsearch-exporter.enabled"
maintainers:
- name: Zelldon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines 525 to 383
# Persistent Volume Storage Class
storageClass: "ssd"
# Persistent Volume Size
size: 16Gi
size: 32Gi
# Persistent Volume Access Modes
accessModes: [ "ReadWriteOnce" ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ I think we have to validate whether the values files also have changed.

@rodrigo-lourenco-lopes
Copy link
Contributor Author

@ChrisKujawa It is starting core pods but these are failing to initialize, do we need to change the values.yaml for this?
The other pods seem to initialize correctly.

@ChrisKujawa
Copy link
Member

@ChrisKujawa It is starting core pods but these are failing to initialize, do we need to change the values.yaml for this? The other pods seem to initialize correctly.

Looks to me that a complete new Core section was introduced https://github.com/camunda/camunda-platform-helm/blob/main/charts/camunda-platform-alpha/values.yaml#L1968-L1980

@ChrisKujawa
Copy link
Member

@ChrisKujawa It is starting core pods but these are failing to initialize, do we need to change the values.yaml for this?
The other pods seem to initialize correctly.

I will validate with stakeholders

@@ -10,7 +10,7 @@ sources:
dependencies:
- name: camunda-platform
repository: "oci://ghcr.io/camunda/helm"
version: "0.0.0-8.7.0-alpha1"
version: "0.0.0-snapshot-alpha"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This is now a snapshot version. When alpha2 is released we can pin this to 0.0.0-8.7.0-alpha2. For now we need this to be able to use the fixes we made in the camunda platform helm charts.

@remcowesterhoud
Copy link

remcowesterhoud commented Dec 3, 2024

@rodrigo-lourenco-lopes sorry for hijacking your PR 😄 I can't assign you as a reviewer, but please have a look at the last 3 commits I made. These should fix benchmarks 🤞 Mine is currently running here

@ChrisKujawa If you have the time I'd appreciate it if you could also look at my last 3 commits 🙂

charts/zeebe-benchmark/values-realistic-benchmark.yaml Outdated Show resolved Hide resolved
charts/zeebe-benchmark/values.yaml Outdated Show resolved Hide resolved
Comment on lines 341 to 342
pvc:
accessModes: [ "ReadWriteOnce" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤡 Uh real break of the objects names

charts/zeebe-benchmark/values.yaml Show resolved Hide resolved
charts/zeebe-benchmark/values.yaml Show resolved Hide resolved
Comment on lines -466 to -474
env:
- name: OPERATE_LOG_APPENDER
value: Stackdriver
- name: OPERATE_LOG_STACKDRIVER_SERVICENAME
value: operate
- name: OPERATE_LOG_STACKDRIVER_SERVICEVERSION
valueFrom:
fieldRef:
fieldPath: metadata.namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh that will be painful. This means we will not filter anymore by service name in stackdriver.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying I should add these env variables to the core application?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general yes we need some to set the right service name. But this allows to set only one service name before we split between zeebe and operate and gateway

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand it. I've addded these env variables to the core app now, but the way it sounds that's not sufficient.


exec /usr/local/camunda/bin/camunda
application.yaml: |

spring:
profiles:
active: auth
active: "operate,tasklist,broker,auth"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ Ah ok so it seems Operate is enabled.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be 🤔 I modified the SPRING_PROFILES_ACTIVE to disable it.

charts/zeebe-benchmark/templates/NOTES.txt Outdated Show resolved Hide resolved
@remcowesterhoud
Copy link

remcowesterhoud commented Dec 4, 2024

Okay I've processed some of the review comments. I can run the regular benchmarks fine manually. The realistic benchmarks don't really work. sa soon as I start Operate the readiness probe returns a 503.

I can't spend more time on this atm since I'm not medic anymore. @rodrigo-lourenco-lopes feel free to take over again 😄

FWIW, I believe we should merge this PR. It fixes the regular benchmarks so that's more than we have now and a fresh PR for any more changes would be a lot more comprehensible.

@rodrigo-lourenco-lopes
Copy link
Contributor Author

FWIW, I believe we should merge this PR. It fixes the regular benchmarks so that's more than we have now and a fresh PR for any more changes would be a lot more comprehensible.

@ChrisKujawa is it ok that we merge this now and start working on the next fixes?

Copy link
Member

@ChrisKujawa ChrisKujawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both 🚀🚀

optional: true
# To run Operate and Tasklist add these to the value of this variable
- name: SPRING_PROFILES_ACTIVE
value: "auth,broker,operate"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

limits:
cpu: 3500m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check whether this fits in our nodes

Copy link

@remcowesterhoud remcowesterhoud Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the node selector to a bigger one for these realistic benchmarks 😄

@rodrigo-lourenco-lopes
Copy link
Contributor Author

Further context to the latest chages:

This is already a very messy PR which migrates these benchmarks to work with the latest changes in the helm charts.
I will create some follow up issues to:

  • Test and change the core resources. Now that we have the entire application in one container these need to be tested and changed in order to keeep the same perfomance.
  • Add or zeebe configuratitions to the values here, this can be done through core.configuration but will overide the defaults in the config values in core.configmap in the camunda-platform-helm repo, see https://github.com/camunda/camunda-platform-helm/blob/75a1a164e9c7eb4cf021d1de4eee9b5ae39b6438/charts/camunda-platform-alpha/templates/core/configmap.yaml#L17-L21
  • (this was already an existing problem before the changes) The exporters seem to be laggin behind quite a lot and exporting at a slower pace than what we expecting them to do, for example capping at around 200 to 300 while processing is at 2000 in normal benchamrks. This has been observed in many weekly benchmarks, and needs to be investigated as it can possible be a regression.

@rodrigo-lourenco-lopes rodrigo-lourenco-lopes changed the title Increase ES disk size, Fix tests and fetch correct Camunda version Adapt benchmarks configuration to be in line with updated charts. Dec 18, 2024
Copy link
Member

@ChrisKujawa ChrisKujawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you tested it because i feel there are some issues with it still? Not sure whether it makes sense to merge it like that tbh.

@@ -17,7 +17,7 @@ spec:
spec:
containers:
- name: starter
image: "{{ .Values.global.image.repository }}/starter:{{ .Values.global.image.tag }}"
image: "{{ .Values.starter.image.repository }}/starter:{{ .Values.starter.image.tag }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This change was not necessary we can keep the global. Otherwise this would mean we need to change our github actions etc.

@@ -18,7 +18,7 @@ spec:
spec:
containers:
- name: {{ $workerName }}-worker
image: "{{ $.Values.global.image.repository }}/worker:{{ $.Values.global.image.tag }}"
image: "{{ $.Values.workers.image.repository }}/worker:{{ $.Values.workers.image.tag }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ same as above

@@ -19,13 +19,13 @@ spec:
spec:
containers:
- name: publisher
image: "gcr.io/zeebe-io/starter:SNAPSHOT"
image: "/starter:"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ this is related to the removal/change of the global value

Comment on lines -7 to -13
image:
# Image.repository defines the repository from which to fetch the docker images
repository: "gcr.io/zeebe-io"
# Image.tag defines the tag / version which should be used in the chart
tag: SNAPSHOT
# Image.pullPolicy defines the image pull policy which should be used https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy
pullPolicy: Always
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be kept

fieldPath: metadata.namespace
- name: OPERATE_LOG_APPENDER
value: Stackdriver
- name: OPERATE_LOG_STACKDRIVER_SERVICENAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this makes sense

@@ -170,74 +180,72 @@ leaderBalancing:

# Zeebe configuration to configure Zeebe and Gateway
zeebe:
# Zeebe.config can be used to configure Zeebe Broker and Gateway additional without the need of overwriting all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we removed these?

enabled: false

# ELASTIC
# TODO: this configuration replaces the default provided in the configmaps in camunda-platform-helm, we should ideally try to add the configuration in the charts for these instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use the config we have above

runAsUser: 1000
capabilities:
add: [ "NET_ADMIN" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this for testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix golden files tests after Camunda-platform-helm Changes
3 participants