Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plumb customer-configured DNS into instances #4265

Merged
merged 5 commits into from
Oct 19, 2023
Merged

plumb customer-configured DNS into instances #4265

merged 5 commits into from
Oct 19, 2023

Conversation

iliana
Copy link
Contributor

@iliana iliana commented Oct 11, 2023

Currently OPTE hardcodes DNS servers to 8.8.8.8, and we would prefer to not do this.

When setting up the rack, customers configure DNS servers. Currently this is added to Nexus's deployment configuration, so that Nexus can resolve external domain names for SAML. This change plumbs these DNS servers through to sled-agent and OPTE.

For non-instance OPTE ports, DhcpCfg::default() is used, which is equivalent to "no hostname, no domain name, no DNS servers, no search domains".

This is not the final state of the DHCP work we want to do; in talking with @rmustacc and @rcgoodfellow we agree that implementing this minimal DHCP option set is the correct thing to do urgently; and that we still need to design the broader inter-instance networking picture (whether we want an instance-facing recursive resolver on the rack and whether it should resolve domain names for other instances with VPC IPs, what those hostnames and search domains should be, and how customers can modify these settings).

@iliana iliana requested review from luqmana and gjcolombo October 17, 2023 21:16
@iliana
Copy link
Contributor Author

iliana commented Oct 17, 2023

@gjcolombo FYI — requested a review from you because this was written prior to #4194 landing, and I did sort of the bare minimum merge from main just now. I'd like to hear if you think there's a more logical spot to put these changes to Nexus / Sled Agent.

Copy link
Contributor

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new instance plumbing LGTM (with my thanks for spending time picking through the merge). I didn't look at the OPTE or service plumbing at all.

hostname: Some(
hardware
.properties
.hostname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily for this PR, but I think it might be worth an filing issue: I'm guessing that if this conversion fails it implies that the hostname in the instance create parameters was completely bogus such that this instance will never be able to start. If that's so we should see about rejecting invalid hostnames in Nexus::project_instance_create so that we reject bad hostnames up front instead of letting the create go through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone and verified this is the case, but I believe that our restrictions on the Name type make them all valid single-label hostnames. It would be good to prove this out though (or use a single consistent type across Omicron and OPTE).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the intent. In RFD 4 names were defined to be compliant with RFC 1035. However, I don't think we ever validated hostname the same way:

/// RFC1035-compliant hostname for the instance.
// TODO-cleanup different type?
pub hostname: String,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Well. I'll see if that's not too bad to fix along with this PR.

@iliana iliana marked this pull request as ready for review October 19, 2023 04:54
@iliana
Copy link
Contributor Author

iliana commented Oct 19, 2023

Two follow-up items I'd like to do are:

  • instance hostname validation, as noted by Dave above
  • verify resolv.conf contains our DNS servers in the end-to-end instance launch test

But we may want to land this before those tasks are done to avoid blocking the release. I can turn these into follow-up issues if we like.

@luqmana
Copy link
Contributor

luqmana commented Oct 19, 2023

But we may want to land this before those tasks are done to avoid blocking the release. I can turn these into follow-up issues if we like.

I think that makes sense. I wouldn't say those are blockers.

@iliana iliana merged commit 93b280c into main Oct 19, 2023
24 checks passed
@iliana iliana deleted the iliana/dhcpcfg branch October 19, 2023 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants