Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v6.0.4 Seems to be a Breaking Release? #333

Closed
jsmythsci opened this issue Sep 29, 2020 · 6 comments
Closed

Release v6.0.4 Seems to be a Breaking Release? #333

jsmythsci opened this issue Sep 29, 2020 · 6 comments
Assignees

Comments

@jsmythsci
Copy link

According to README.md, release 7.0 will be a breaking release "with nonroot and native bases" but it seems like nonroot was made the default with c212ea6, which is included in v6.0.4.

When I tried to deploy the non-root containers into a kubernetes cluster that is currently running a deployment based on v6.0.3, all 3 of the first containers (kafka, zoo, pzoo) failed to start and I was forced to roll back the changes.

I could work around this issue by basing my variants off of ../../rbac-namespace-default, ../../kafka and ../../zookeeper instead of ../scale-3-5 but wanted to confirm that the nonroot changes were intentionally included in this release.

@solsson
Copy link
Contributor

solsson commented Sep 29, 2020

You're right. The meaning of "breaking" with 7.0 was that the base folders will change behavior, mostly wrt replicas. Variants have been experimental for a while, and I think it's best to run master. I should get round to releasing again, because we've been running dev variants reliably as well as non-root in production since a few months back.

@solsson solsson self-assigned this Sep 29, 2020
@solsson
Copy link
Contributor

solsson commented Sep 29, 2020

#322 should be considered for the release, and https://github.com/cloudworkz/kafka-minion/releases/tag/v1.0.2

@jsmythsci
Copy link
Author

Thank you for the clarification.

Do you have any suggestions or guidance for moving from the "regular" containers to nonroot?

We are using local storage for persistence and I can see that the files there are all owned by root. Would changing ownership to match the userid used in the nonroot containers be sufficient or is there more to it?

As an additional data point, I can see the the volumes for the 3 containers that tried to start after merging v6.0.4 have all had their group ownership changed to "nogroup" and most of the files have group write permissions added.

The only files I see in the kafka persistence directory that do not have group write are snapshot and leader-epoc-checkpoint files that have been written since the container was reverted back to v6.0.3. On the zookeeper side, it seems to be the same, except that ./data/myid retains group write permission even after having been updated after rollback.

@jsmythsci
Copy link
Author

In case anyone else is following and/or interested in this, it seems that the update failures may not be related to non-root containers after all.

I think the ZK containers are failing to come online because of a known ZK upgrade issue related to missing snapshots (https://issues.apache.org/jira/browse/ZOOKEEPER-3513 and https://issues.apache.org/jira/browse/ZOOKEEPER-3056). I think the updated Kafka container is not coming online because it is timing out while trying to connect to the new ZK containers.

I will post back when I have more information.

@jsmythsci
Copy link
Author

I confirmed that switching to non-root containers was actually pretty straight-forward.

I found out that we had introduced an issue in our Kustomization that resulted in DNS lookups failing for the zoo containers. This didn't seem to keep our original deployment from failing overall because the 3 pzoo instances all resolved just fine.

After fixing that issue I had to apply the following transformation to zookeeper-config to get around the issue of the missing snapshots:

kustomization.yaml:

bases:
- scale-3-5
patchesStrategicMerge:
- zk-trust-empty-snapshot.yaml

zk-trust-empty-snapshot.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: zookeeper-config
data:
  zookeeper.properties: |
    4lw.commands.whitelist=ruok
    tickTime=2000
    dataDir=/var/lib/zookeeper/data
    dataLogDir=/var/lib/zookeeper/log
    clientPort=2181
    maxClientCnxns=2
    initLimit=5
    syncLimit=2
    server.1=pzoo-0.pzoo:2888:3888:participant
    server.2=pzoo-1.pzoo:2888:3888:participant
    server.3=pzoo-2.pzoo:2888:3888:participant
    server.4=zoo-0.zoo:2888:3888:participant
    server.5=zoo-1.zoo:2888:3888:participant
    snapshot.trust.empty=true

This file just appends snapshot.trust.empty=true to the existing zookeeper.properties defined in 10zookeeper-config.yml.

This change seems to have been all that was required to deploy release 6.0.4 and have it start successfully.

We still have an outstanding issue in that our customization to the broker container command gets lost when using non-root containers but as of now all 3 StatefulSets seem to be working normally in our test environment.

@jsmythsci
Copy link
Author

Well, that's strange.

Our Kafka broker startup command --overrides were consistently not getting applied for days.

I tried to fix it by switching from patchesJson6902 to patchesStrategicMerge and redefining the whole command block, which worked as expected. Then I tried to recreate the issue by backing out that change and I couldn't reproduce the original issue any more. I checked our test environment and confirmed that the expected --overrides were in place so I guess it was just an issue with my local kustomize build being wonky.

@solsson I see you assigned this to yourself so I will leave it open in case you believe there is work to be done but it seems like my own issues with upgrading to v6.0.4 came down to:

  1. A bug in our existing kustomization, and
  2. A known issue upgrading Zookeeper from 5.4 to 5.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants