Switch to Parallel start for Kafka and Zookeeper #226

solsson · 2018-11-28T15:29:18Z

Is there any reason to keep using OrderedReady with either Kafka or Zookeeper? We still get ordered rolling updates after #55. The problem with OrderedReady is that for kafka and pzoo if the zone with the volume for index 0 is unavailable none of the other pods can be restarted.

@sibtainabbas10 I noticed you used Parallel in #28 back in January. Have you seen any issues?

atamon

The only time we tried this in production was for minio right? Their distribution/replication strategy required us to do so even. I can't remember that we noticed anything strange in how kubernetes behaved, and even so we had quite a few crashlooping pods as the minio startup-sequence was mishbehaving for us

solsson · 2018-11-28T19:03:21Z

I can't come up with any reason to stick with OrderedReady.

I just tested to apply all manifests at the same time in a fresh ninikube (k apply -f zookeeper/ && k apply -f kafka/ && k apply -f kafka/test/ after configuration + rbac) and it took 2:30 min for Kafka to be ready with 3 restarts per kafka pod and a further 1 minute for the tests to go ready.

stigok · 2018-11-29T14:57:49Z

But a problem arises when the StatefulSet changes and the pods are switched out. We'd definitely not want to take down the whole cluster at once. In a Deployment resource we could set maxUnavailable and maxSurge, but that won't work for a StatefulSet.

solsson · 2018-11-29T20:59:02Z

when the StatefulSet changes and the pods are switched out

@stigok I tested that with edited memory limits and it behaved as expected, never more than one pod unready. Did you observe a different behavior?

solsson · 2018-11-30T08:20:21Z

I've verified again, on k8s 1.11.2, and the update behavior is as expected.

rolling-update-test1.txt

The kafkacat based test did have a short unreadiness period when a partition leader was down. Only for one of the three partitions. I'm not sure how to interpret that.

test-kafkacat [0] offset 396
Last message is 10.255 old:
Test kafkacat-hrhwr@2018-11-30T08:00:21,053167482+00:00
test-kafkacat [0] offset 398
test-kafkacat [0] offset 399
test-kafkacat [0] offset 400
test-kafkacat [0] offset 401
% ERROR: offsets_for_times failed: Broker: Not leader for partition
Last message is 10.205 old:
Test kafkacat-hrhwr@2018-11-30T08:01:21,051927817+00:00
Last message is 20.207 old:
Test kafkacat-hrhwr@2018-11-30T08:01:21,051927817+00:00
test-kafkacat [0] offset 405

I've also tested the following bad patch resulting in the desired behavior that only kafka-2 goes unready, but the rollout process stops.

         command:
-        - ./bin/kafka-server-start.sh
+        - tail
+        - -f
         - /etc/kafka/server.properties

I was positively surprised that kubectl -n kafka rollout undo statefulset kafka resolved the situation, behaving like with a deployment.

stigok · 2018-12-10T01:36:57Z

You are absolutely right. It worked like a charm when applying the JMX exporter patch now 👍

Switches to Parallel start for Kafka and Zookeeper

84997f5

solsson added this to the 5.0 - Java 11 milestone Nov 28, 2018

solsson requested a review from atamon November 28, 2018 15:29

atamon reviewed Nov 28, 2018

View reviewed changes

solsson merged commit c41f9f9 into master Nov 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to Parallel start for Kafka and Zookeeper #226

Switch to Parallel start for Kafka and Zookeeper #226

solsson commented Nov 28, 2018

atamon left a comment

solsson commented Nov 28, 2018

stigok commented Nov 29, 2018

solsson commented Nov 29, 2018

solsson commented Nov 30, 2018

stigok commented Dec 10, 2018

Switch to Parallel start for Kafka and Zookeeper #226

Switch to Parallel start for Kafka and Zookeeper #226

Conversation

solsson commented Nov 28, 2018

atamon left a comment

Choose a reason for hiding this comment

solsson commented Nov 28, 2018

stigok commented Nov 29, 2018

solsson commented Nov 29, 2018

solsson commented Nov 30, 2018

stigok commented Dec 10, 2018