-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Host Network Attachment Definition #192
base: main
Are you sure you want to change the base?
Conversation
…t definition By including the Host nad yaml file we are able to centralize the NAD setup wihtin the operator itself Currently the Namespace follows the CDA implementation where it uses the Default Namespace
…oNAD to include host setup as well - The idea is to move the check from outside the nad and always reconcile the NAD and apply the appropriate manifest but currently the auto mode can't be used to apply the Host NAD since it requires the daemon so we keep a check to skip it. - This implementation has problems with auto since this mode relies on the daemon, which opens a bigger discussion on whether managing the NAD is the job of the daemon or the controller.
We should skip deploying the NAD ourselves if the configuration spec mode is `auto`, since this implementation can't auto detect the hardware in the controller.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SamD2021 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @SamD2021. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@@ -99,14 +99,13 @@ func (r *DpuOperatorConfigReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
return ctrl.Result{}, err | |||
} | |||
|
|||
if dpuOperatorConfig.Spec.Mode == "dpu" { | |||
if dpuOperatorConfig.Spec.Mode != "auto" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bn222 @wizhaoredhat discussed this with Sam. Should we move creating the net-attach-def for the host to the daemon as well?
The issue being that the top level operator does not have access to the platform info to detect whether it should create a NAD for the host-side or dpu-side, other than the dpu operator config spec.
However I don't think we can rely on this spec in the case we auto detect mode.
I know Balazs is working on a refactor to the CRs, so maybe this will clear this point up.
As part of this PR I think we should remove creating the nad in CDA, and add a bump to CDA as a commit to make sure e2e tests are still working. |
As part of bringing Host DPU NAD creation into the DPU-Operator,we need to also remove the steps where the cda creates the host dpu nad.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
"name": "dpu-cni", | ||
"ipam": { | ||
"type": "host-local", | ||
"subnet": "10.56.217.0/24" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how useful this change is for the host is.
- As you noticed "ipam" can be configured by the nad. The IP Address "10.56.217.0/24"" is not relevant for most if not all customer setups. Yes it is relevant for CDA, but we aren't building this for CDA.
- Also there are different types of IPAMs
- dpu-cni could configure parameters like MTU on the host.
- In short, the reason why I only did the NAD for the DPU is that it is assumed the DPU's Network Functions will be created by the operator only. But the host NAD would be consumed by workload pods that are outside our control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
host NAD will be created by the operator as a consequence of creating a DpuNetwork CR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the DpuNetwork CR must contain IPAM settings for the NAD in the future.
TL;DR
This PR makes some changes to how we handle Network Attachment Definitions (NAD) in the DPU Operator. The goal is to centralize the NAD setup, and automate the deployment of the NAD for the Host. I'm also hoping to discuss if these changes should be moved to the daemon so we can support auto mode as well.
What’s Changed
Host NAD YAML:
Added a manifest for Host NAD to centralize its configuration in the operator itself.
The namespace follows what we use in Cluster Deployment Automation (CDA), so the default resources get nad properly.
NAD Reconciliation:
Changed the check for NAD to ensure both Host and DPU mode have deployed its NADs while skipping it for Auto mode.
Auto Mode Handling:
If the configuration is set to auto, the operator will skip NAD deployment since the controller can’t detect the hardware in this mode. This keeps things cleaner and avoids unnecessary errors.
Why These Changes Matter
By centralizing the Host NAD setup and improving how we handle reconciliation, this makes the operator more reliable and lessens the burden of configuration from the user. I also hope to have a discussion about whether NAD management should be the job of the controller or the daemon, since the latter is the one that can auto detect the hardware.