-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orchestratord TLS for balancerd and environmentd #30444
base: main
Are you sure you want to change the base?
orchestratord TLS for balancerd and environmentd #30444
Conversation
8bb8961
to
2874f38
Compare
76d791d
to
899769a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense -- will take another look when we're closer to merging, but the plumbing seems right to me for envd+balancerd
@@ -156,7 +187,8 @@ fn create_console_deployment_object( | |||
..Default::default() | |||
}]; | |||
|
|||
if config.enable_tls { | |||
if mz.spec.console_external_certificate_spec.is_some() { | |||
// TODO define volumes, volume_mounts, and any arg changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to wire in the console certs in this PR? or leave for a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to leave it for a follow up. I don't know how the console works, and didn't want to block this on me figuring that out.
Console TLS support is not required for parity with the existing SaaS offering, which was the goal of this PR.
// The cert-manager Issuer or ClusterIssuer to use for database internal communication. | ||
// The issuer_ref field is required. | ||
// This currently is only used for environmentd, but will eventually support clusterd. | ||
pub internal_certificate_spec: Option<MaterializeCertSpec>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about it some more, i wonder how annoying it is going to be to have this solely configured at the environment level - it kind of feels like we want to be able to set defaults for all of these fields at the orchestratord level (as part of the helm chart config) because in the vast majority of cases, users are going to want all of the fields here (except possibly the dns names) to be enforced to be identical across environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should have one set of defaults passed to orchestratord from the helm chart, or do we need a set for each type of cert?
fea4772
to
864e8cd
Compare
Comment the cert specs in the testing environmentd.yaml. Having them present would require cert-manager in testing clusters, which is not guaranteed, and doesn't have documentation.
864e8cd
to
f1b22ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notting one thing down while testing this out in different scenarios, I noticed so far is that the console container would crash if the balancerd is not ready yet:
2024/11/28 18:31:37 [emerg] 1#1: host not found in upstream "mzghw4549ri2-balancerd.materialize-environment.svc.cluster.local" in /etc/nginx/conf.d/default.conf:28
nginx: [emerg] host not found in upstream "mzghw4549ri2-balancerd.materialize-environment.svc.cluster.local" in /etc/nginx/conf.d/default.conf:28
First time I'm seeing this, though not sure if this is related to this PR specifically or if it has always been there but might be best to wait for the console creation until the balancerd is ready.
Not a blocker for this PR but might be an issue for the nighty tests that @def- has set up.
Not sure if we are too worried about this edge case but, one more thing that I noticed is that if you were to deploy Materialize with no TLS configured, and then try to set TLS, upgrade the operator and the CRDs, the console pods do not get recreated during the instance rolling upgrade, so they end up not being able to connect to balancerd:
After manually deleting and recreating the console pods it works well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested this on a fresh new EKS cluster and besides the two comments above, everything seems to be working as expected!
Wrote a simple guide while testing this here.
Add support in orchestratord to configure TLS for balancerd and environmentd.
This does not yet configure TLS for clusterd or the console, nor does it configure MTLS.
Motivation
Part of https://github.com/MaterializeInc/cloud/issues/8363, and brings orchestratord closer to parity with our SaaS.
Tips for reviewer
The cert-manager CRDs were generated with Kopium, using src/k8s/src/gen/generate-kopium.sh in the cloud repo. The exact output in this PR is based on some changes to that script in this draft PR.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.