Adapt benchmarks configuration to be in line with updated charts. #213

rodrigo-lourenco-lopes · 2024-11-18T14:35:53Z

This is already a very messy PR which migrates these benchmarks to work with the latest changes in the helm charts.
I will create some follow-up issues to:

Test and change the core resources. Now that we have the entire application in one container these need to be tested and changed in order to keeep the same perfomance.
Add zeebe configuratitions to our helm values here, this can be done through core.configuration but this will overide the defaults in the config values in core.configmap in the camunda-platform-helm repo, see https://github.com/camunda/camunda-platform-helm/blob/75a1a164e9c7eb4cf021d1de4eee9b5ae39b6438/charts/camunda-platform-alpha/templates/core/configmap.yaml#L17-L21
(this was already an existing problem before the changes) The exporters seem to be laggin behind quite a lot and exporting at a slower pace than what we expecting them to do, for example capping at around 200 to 300 while processing is at 2000 in normal benchamrks. This has been observed in many weekly benchmarks, and needs to be investigated as it can possible be a regression.Due to some errors of benchmarks out of disk space in ES in some of the latest benchmarks we decided to double it for now.

Additionally this PR also addresses #214 fixing the tests related to assuring correct configurations with the golden files.

Closes: #214

Follow up issues:

ChrisKujawa · 2024-11-18T15:01:50Z

@rodrigo-lourenco-lopes it is about disk space not memory :D

rodrigo-lourenco-lopes · 2024-11-18T15:10:02Z

@rodrigo-lourenco-lopes it is about disk space not memory :D

Ups, I misread the intention. Just corrected it.

Downloading camunda-platform from repo oci://ghcr.io/camunda/helm
Save error occurred: could not download oci://ghcr.io/camunda/helm/camunda-platform: ghcr.io/camunda/helm/camunda-platform:0.0.0-12.0.0-alpha1: not found
Error: could not download oci://ghcr.io/camunda/helm/camunda-platform: ghcr.io/camunda/helm/camunda-platform:0.0.0-12.0.0-alpha1: not found

@Zelldon also it seems that we are having issues pulling our chart dependency, have you seen this issue before?

ChrisKujawa · 2024-11-19T07:26:52Z

@rodrigo-lourenco-lopes it is about disk space not memory :D

Ups, I misread the intention. Just corrected it.

Downloading camunda-platform from repo oci://ghcr.io/camunda/helm
Save error occurred: could not download oci://ghcr.io/camunda/helm/camunda-platform: ghcr.io/camunda/helm/camunda-platform:0.0.0-12.0.0-alpha1: not found
Error: could not download oci://ghcr.io/camunda/helm/camunda-platform: ghcr.io/camunda/helm/camunda-platform:0.0.0-12.0.0-alpha1: not found

@Zelldon also it seems that we are having issues pulling our chart dependency, have you seen this issue before?

I haven't seen this before 🤔

rodrigo-lourenco-lopes · 2024-11-21T11:31:44Z

This commit also changed the templates and the configuration files, we will have to adapt the templates to this camunda/camunda-platform-helm@28d7927#diff-139e41c5d8eff029161d9db039ef8f0d46df8d6de4e91e42d7abb3c2103ba61a

ChrisKujawa · 2024-11-21T15:23:45Z

@rodrigo-lourenco-lopes can you try to deploy this into a namespace and check whether it comes up?

ChrisKujawa

Thanks @rodrigo-lourenco-lopes I think two or three things we still need to validate

ChrisKujawa · 2024-11-21T14:36:57Z

charts/zeebe-benchmark/Chart.yaml

@@ -17,7 +17,7 @@ dependencies:
    version: 6.4.0
    condition: "prometheus-elasticsearch-exporter.enabled"
 maintainers:
-  - name: Zelldon


ChrisKujawa · 2024-11-21T15:25:56Z

charts/zeebe-benchmark/values.yaml

        # Persistent Volume Storage Class
        storageClass: "ssd"
        # Persistent Volume Size
-        size: 16Gi
+        size: 32Gi
        # Persistent Volume Access Modes
        accessModes: [ "ReadWriteOnce" ]



❓ I think we have to validate whether the values files also have changed.

rodrigo-lourenco-lopes · 2024-11-21T15:40:33Z

@ChrisKujawa It is starting core pods but these are failing to initialize, do we need to change the values.yaml for this?
The other pods seem to initialize correctly.

ChrisKujawa · 2024-11-21T16:17:16Z

@ChrisKujawa It is starting core pods but these are failing to initialize, do we need to change the values.yaml for this? The other pods seem to initialize correctly.

Looks to me that a complete new Core section was introduced https://github.com/camunda/camunda-platform-helm/blob/main/charts/camunda-platform-alpha/values.yaml#L1968-L1980

ChrisKujawa · 2024-11-21T16:32:50Z

@ChrisKujawa It is starting core pods but these are failing to initialize, do we need to change the values.yaml for this?
The other pods seem to initialize correctly.

I will validate with stakeholders

remcowesterhoud · 2024-12-03T12:42:04Z

charts/zeebe-benchmark/Chart.yaml

@@ -10,7 +10,7 @@ sources:
 dependencies:
  - name: camunda-platform
    repository: "oci://ghcr.io/camunda/helm"
-    version: "0.0.0-8.7.0-alpha1"
+    version: "0.0.0-snapshot-alpha"


⚠️ This is now a snapshot version. When alpha2 is released we can pin this to 0.0.0-8.7.0-alpha2. For now we need this to be able to use the fixes we made in the camunda platform helm charts.

remcowesterhoud · 2024-12-03T13:10:53Z

@rodrigo-lourenco-lopes sorry for hijacking your PR 😄 I can't assign you as a reviewer, but please have a look at the last 3 commits I made. These should fix benchmarks 🤞 Mine is currently running here

@ChrisKujawa If you have the time I'd appreciate it if you could also look at my last 3 commits 🙂

charts/zeebe-benchmark/values-realistic-benchmark.yaml

charts/zeebe-benchmark/values.yaml

ChrisKujawa · 2024-12-03T14:15:53Z

charts/zeebe-benchmark/values.yaml

+    pvc:
+     accessModes: [ "ReadWriteOnce" ]


🤡 Uh real break of the objects names

charts/zeebe-benchmark/values.yaml

ChrisKujawa · 2024-12-03T14:17:54Z

charts/zeebe-benchmark/values.yaml

-    env:
-      - name: OPERATE_LOG_APPENDER
-        value: Stackdriver
-      - name: OPERATE_LOG_STACKDRIVER_SERVICENAME
-        value: operate
-      - name: OPERATE_LOG_STACKDRIVER_SERVICEVERSION
-        valueFrom:
-          fieldRef:
-            fieldPath: metadata.namespace  


Uh that will be painful. This means we will not filter anymore by service name in stackdriver.

So you're saying I should add these env variables to the core application?

In general yes we need some to set the right service name. But this allows to set only one service name before we split between zeebe and operate and gateway

I don't fully understand it. I've addded these env variables to the core app now, but the way it sounds that's not sufficient.

ChrisKujawa · 2024-12-03T14:19:00Z

charts/zeebe-benchmark/test/golden/c8-core-configmap.golden.yaml


    exec /usr/local/camunda/bin/camunda
  application.yaml: |

    spring:
      profiles:
-        active: auth
+        active: "operate,tasklist,broker,auth"


ℹ️ Ah ok so it seems Operate is enabled.

It shouldn't be 🤔 I modified the SPRING_PROFILES_ACTIVE to disable it.

charts/zeebe-benchmark/templates/NOTES.txt

remcowesterhoud · 2024-12-04T09:14:02Z

Okay I've processed some of the review comments. I can run the regular benchmarks fine manually. The realistic benchmarks don't really work. sa soon as I start Operate the readiness probe returns a 503.

I can't spend more time on this atm since I'm not medic anymore. @rodrigo-lourenco-lopes feel free to take over again 😄

FWIW, I believe we should merge this PR. It fixes the regular benchmarks so that's more than we have now and a fresh PR for any more changes would be a lot more comprehensible.

rodrigo-lourenco-lopes · 2024-12-05T09:08:04Z

FWIW, I believe we should merge this PR. It fixes the regular benchmarks so that's more than we have now and a fresh PR for any more changes would be a lot more comprehensible.

@ChrisKujawa is it ok that we merge this now and start working on the next fixes?

ChrisKujawa

Thank you both 🚀🚀

ChrisKujawa · 2024-12-05T11:41:22Z

charts/zeebe-benchmark/values-realistic-benchmark.yaml

+            optional: true
+      # To run Operate and Tasklist add these to the value of this variable
+      - name: SPRING_PROFILES_ACTIVE
+        value: "auth,broker,operate"


ChrisKujawa · 2024-12-05T11:42:10Z

charts/zeebe-benchmark/values-realistic-benchmark.yaml

      limits:
+        cpu: 3500m


We need to check whether this fits in our nodes

I changed the node selector to a bigger one for these realistic benchmarks 😄

To see the full reasoning check the slack thread: https://camunda.slack.com/archives/C06GF0JPY68/p1732001694427559?thread_ts=1731918845.694709&cid=C06GF0JPY68

The link to the commit in camunda-platform-helm camunda/camunda-platform-helm@28d7927#diff-624a96601c5010c87f441781ea4c0e803b469e0bf2507e7ea86a651b0967ca9d

…, tasklist, operate component

I have autogenerated these because there were so many changes

Add the resources previously used by the gateway to the core app.

We can't just override 1 env variable to activate operate. Instead all the env variables got overridden and thus we were missing most of them. By adding them to this values file we overcome that problem.

… and workers.

These images are not available in the default registry.

rodrigo-lourenco-lopes · 2024-12-18T15:36:25Z

Further context to the latest chages:

This is already a very messy PR which migrates these benchmarks to work with the latest changes in the helm charts.
I will create some follow up issues to:

Test and change the core resources. Now that we have the entire application in one container these need to be tested and changed in order to keeep the same perfomance.
Add or zeebe configuratitions to the values here, this can be done through core.configuration but will overide the defaults in the config values in core.configmap in the camunda-platform-helm repo, see https://github.com/camunda/camunda-platform-helm/blob/75a1a164e9c7eb4cf021d1de4eee9b5ae39b6438/charts/camunda-platform-alpha/templates/core/configmap.yaml#L17-L21
(this was already an existing problem before the changes) The exporters seem to be laggin behind quite a lot and exporting at a slower pace than what we expecting them to do, for example capping at around 200 to 300 while processing is at 2000 in normal benchamrks. This has been observed in many weekly benchmarks, and needs to be investigated as it can possible be a regression.

ChrisKujawa

Did you tested it because i feel there are some issues with it still? Not sure whether it makes sense to merge it like that tbh.

ChrisKujawa · 2024-12-19T20:53:06Z

charts/zeebe-benchmark/templates/starter.yaml

@@ -17,7 +17,7 @@ spec:
    spec:
      containers:
        - name: starter
-          image: "{{ .Values.global.image.repository }}/starter:{{ .Values.global.image.tag }}"
+          image: "{{ .Values.starter.image.repository }}/starter:{{ .Values.starter.image.tag }}"


❌ This change was not necessary we can keep the global. Otherwise this would mean we need to change our github actions etc.

ChrisKujawa · 2024-12-19T20:54:02Z

charts/zeebe-benchmark/templates/workers.yaml

@@ -18,7 +18,7 @@ spec:
    spec:
      containers:
        - name: {{ $workerName }}-worker
-          image: "{{ $.Values.global.image.repository }}/worker:{{ $.Values.global.image.tag }}"
+          image: "{{ $.Values.workers.image.repository }}/worker:{{ $.Values.workers.image.tag }}"


❌ same as above

ChrisKujawa · 2024-12-19T20:56:42Z

charts/zeebe-benchmark/test/golden/publisher.golden.yaml

@@ -19,13 +19,13 @@ spec:
    spec:
      containers:
        - name: publisher
-          image: "gcr.io/zeebe-io/starter:SNAPSHOT"
+          image: "/starter:"


❌ this is related to the removal/change of the global value

ChrisKujawa · 2024-12-19T20:57:37Z

charts/zeebe-benchmark/values-realistic-benchmark.yaml

-  image:
-    # Image.repository defines the repository from which to fetch the docker images
-    repository: "gcr.io/zeebe-io"
-    # Image.tag defines the tag / version which should be used in the chart
-    tag: SNAPSHOT
-    # Image.pullPolicy defines the image pull policy which should be used https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy
-    pullPolicy: Always


This should be kept

ChrisKujawa · 2024-12-19T20:58:15Z

charts/zeebe-benchmark/values-realistic-benchmark.yaml

+            fieldPath: metadata.namespace
+      - name: OPERATE_LOG_APPENDER
+        value: Stackdriver
+      - name: OPERATE_LOG_STACKDRIVER_SERVICENAME


I dont think this makes sense

ChrisKujawa · 2024-12-19T20:59:20Z

charts/zeebe-benchmark/values.yaml

@@ -170,74 +180,72 @@ leaderBalancing:

 # Zeebe configuration to configure Zeebe and Gateway
 zeebe:
-  # Zeebe.config can be used to configure Zeebe Broker and Gateway additional without the need of overwriting all


Why we removed these?

ChrisKujawa · 2024-12-19T21:00:25Z

charts/zeebe-benchmark/values.yaml

-    enabled: false
-
-  # ELASTIC
+    # TODO:  this configuration replaces the default provided in the configmaps in camunda-platform-helm, we should ideally try to add the configuration in the charts for these instead.


You need to use the config we have above

ChrisKujawa · 2024-12-19T21:00:51Z

charts/zeebe-benchmark/values.yaml

      runAsUser: 1000
-      capabilities:
-        add: [ "NET_ADMIN" ]


We need this for testing

rodrigo-lourenco-lopes self-assigned this Nov 18, 2024

rodrigo-lourenco-lopes requested a review from ChrisKujawa as a code owner November 18, 2024 14:35

rodrigo-lourenco-lopes force-pushed the rl-increase-es-memory branch from 236a14b to 16e76ae Compare November 18, 2024 15:06

rodrigo-lourenco-lopes changed the title ~~Increase ES memory for the benchmarks.~~ Increase ES disk size for normal benchmarks. Nov 18, 2024

rodrigo-lourenco-lopes force-pushed the rl-increase-es-memory branch from 1c37bae to 87e0ae0 Compare November 21, 2024 13:33

rodrigo-lourenco-lopes changed the title ~~Increase ES disk size for normal benchmarks.~~ Increase ES disk size, Fix tests and fetch correct Camunda version Nov 21, 2024

ChrisKujawa reviewed Nov 21, 2024

View reviewed changes

remcowesterhoud force-pushed the rl-increase-es-memory branch from ca6bb5f to a8af21e Compare December 3, 2024 08:49

remcowesterhoud reviewed Dec 3, 2024

View reviewed changes

remcowesterhoud force-pushed the rl-increase-es-memory branch from 1ca03f3 to 12f8052 Compare December 3, 2024 13:09

remcowesterhoud requested a review from ChrisKujawa December 3, 2024 13:09

ChrisKujawa reviewed Dec 3, 2024

View reviewed changes

remcowesterhoud requested a review from ChrisKujawa December 4, 2024 09:12

ChrisKujawa approved these changes Dec 5, 2024

View reviewed changes

rodrigo-lourenco-lopes added 5 commits December 18, 2024 14:11

feat: increase ES memory for the benchmarks.

9892cb7

fix: replace release tag

2224691

To see the full reasoning check the slack thread: https://camunda.slack.com/archives/C06GF0JPY68/p1732001694427559?thread_ts=1731918845.694709&cid=C06GF0JPY68

fix: fix tests to accommodate the changes in charts refactorings.

781ab01

The link to the commit in camunda-platform-helm camunda/camunda-platform-helm@28d7927#diff-624a96601c5010c87f441781ea4c0e803b469e0bf2507e7ea86a651b0967ca9d

fix: fix maintainer name.

8084d8d

refactor: remove unused, and rename appropriately golden files.

ec4f67b

remcowesterhoud and others added 16 commits December 18, 2024 14:12

feat: switch to camunda "core" component, instead of individual zeebe…

d951795

…, tasklist, operate component

test: update golden files

02d5b05

I have autogenerated these because there were so many changes

fix: only print operate in notes if it's enabled

4bced98

feat: increase resources of single core app

1872353

Add the resources previously used by the gateway to the core app.

fix: run operate and tasklist in realistic benchmark

da4ec32

fix: add operate resources to core app

fbf2313

fix: add operate stackdriver env variables to core app

1c1da6c

fix: add env variables to realistic benchmark

766a78f

We can't just override 1 env variable to activate operate. Instead all the env variables got overridden and thus we were missing most of them. By adding them to this values file we overcome that problem.

feat: print active profiles

7f46105

test: regenerate golden files

ea07fac

fix: fix the way we display active profiles.

438ef8e

feat: append dependency to "0.0.0-8.7.0-alpha2"

06bc6c3

feat: add the option to fetch image form a different repo in starters…

551d8bb

… and workers.

feat: fetch a different image for starters and workers.

8699e8d

These images are not available in the default registry.

feat: remap the values.yaml to fit the new chart.

da03a61

fix: remove global values in values-realistic-benchmark.yaml

413d6d0

rodrigo-lourenco-lopes force-pushed the rl-increase-es-memory branch from 7500432 to 413d6d0 Compare December 18, 2024 13:39

fix: fix golden tests.

b3fb717

rodrigo-lourenco-lopes changed the title ~~Increase ES disk size, Fix tests and fetch correct Camunda version~~ Adapt benchmarks configuration to be in line with updated charts. Dec 18, 2024

rodrigo-lourenco-lopes mentioned this pull request Dec 19, 2024

Add back zeebe configuration to our helm values camunda/camunda-platform-helm#2706

Open

rodrigo-lourenco-lopes requested a review from ChrisKujawa December 19, 2024 10:38

ChrisKujawa requested changes Dec 19, 2024

View reviewed changes

Adapt benchmarks configuration to be in line with updated charts. #213

Are you sure you want to change the base?

Adapt benchmarks configuration to be in line with updated charts. #213

Conversation

rodrigo-lourenco-lopes commented Nov 18, 2024 • edited Loading

ChrisKujawa commented Nov 18, 2024

rodrigo-lourenco-lopes commented Nov 18, 2024

ChrisKujawa commented Nov 19, 2024

rodrigo-lourenco-lopes commented Nov 21, 2024

ChrisKujawa commented Nov 21, 2024

ChrisKujawa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodrigo-lourenco-lopes commented Nov 21, 2024

ChrisKujawa commented Nov 21, 2024

ChrisKujawa commented Nov 21, 2024

Choose a reason for hiding this comment

remcowesterhoud commented Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

remcowesterhoud commented Dec 4, 2024 • edited Loading

rodrigo-lourenco-lopes commented Dec 5, 2024

ChrisKujawa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

remcowesterhoud Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

rodrigo-lourenco-lopes commented Dec 18, 2024

ChrisKujawa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodrigo-lourenco-lopes commented Nov 18, 2024 •

edited

Loading

remcowesterhoud commented Dec 3, 2024 •

edited

Loading

remcowesterhoud commented Dec 4, 2024 •

edited

Loading

remcowesterhoud Dec 5, 2024 •

edited

Loading