Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kind,check-cluster-up: Enable Kubevirt CPUManager FG when env supports it #1348

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ormergi
Copy link
Contributor

@ormergi ormergi commented Jan 19, 2025

What this PR does / why we need it:
The "check-up-kind-sriov" is turned optional and not gating because it constantly failing, due to following tests failures:

SRIOV VMI connected to single SRIOV network should have cloud-init meta_data with tagged interface and aligned cpus to sriov interface numa node for VMIs with dedicatedCPUs
SRIOV VMI connected to single SRIOV network [test_id:3959]should create a virtual machine with sriov interface and dedicatedCPUs

The mentioned tests causing the lane to fail following removal of programmatic skips in kubevirt/kubevirt tests kubevirt/kubevirt#13144, affecting the mentioned tests.
Previously the mentioned tests were skipped silently (bad) and now, following the programmatic skip removal, fail loudly.
The root cause for the failures (or previous skips) is tests depends on Kubevirt's CPUManager feature but its not enabled at all, see below notes section for more details *.

This PR fixes the lane by enabling Kubevirt's CPUManager features for environments that supports it, i.e.: when SR-IOV provider is used (the SR-IOV provider create cluster with Kubernetes CPU manager on).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  • When Kubevirt's CPUManager feature is on, it will label supporting nodes with cpumanager=ture label (done by the heartbeat controller).
    The failing tests, creates VMs with dedicated-CPUs option, Kubevirt will label such VM's virt-launcher pod with node-selector signifying cpumanager=true label.
    The end result is the tested VMs fail to become ready on time due to impossible scheduling; VMs has cpumanager=ture node selector, but no node has cpumanager=true label.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

kind/check-cluster-up.sh enable Kubevirt's CPUManager feature for supporting providers (e.g.: SR-IOV).

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Jan 19, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign brianmcarey for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ormergi
Copy link
Contributor Author

ormergi commented Jan 19, 2025

/test check-up-kind-sriov

@ormergi ormergi force-pushed the check-kind-up-cpumanager branch from 92b836a to 6a5991f Compare January 19, 2025 12:55
@ormergi
Copy link
Contributor Author

ormergi commented Jan 19, 2025

@ormergi
Copy link
Contributor Author

ormergi commented Jan 19, 2025

/cc @EdDev @orelmisan @nirdothan

ormergi added a commit to ormergi/project-infra that referenced this pull request Jan 19, 2025
… constantly failing (kubevirt#3878)"

This reverts commit 86cf6b7.

The PR kubevirt/kubevirtci#1348 fixes the issue
and stabilize the lane.

Signed-off-by: Or Mergi <[email protected]>
@ormergi ormergi changed the title kind: Enable Kubevirt CPUManager FG when env supports it kind,check-cluster-up: Enable Kubevirt CPUManager FG when env supports it Jan 19, 2025
Copy link
Member

@orelmisan orelmisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @ormergi.
Could you please give a few words about why is the PR needed?

@@ -69,6 +69,11 @@ export CRI_BIN=${CRI_BIN:-$(detect_cri)}
fi
${kubectl} wait -n kubevirt kv kubevirt --for condition=Available --timeout 15m

if [[ "$KUBEVIRT_PROVIDER" =~ "sriov" ]]; then
# Some SR-IOV tests require Kubevirt CPUManager feature
${kubectl} patch kubevirts -n kubevirt kubevirt --type=json -p='[{"op": "replace", "path": "/spec/configuration/developerConfiguration/featureGates","value": ["CPUManager"]}]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider adding a feature gate to the end of the existing list, instead of replacing the whole list, as it will enable future expansion.

Copy link
Contributor Author

@ormergi ormergi Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case additional FG should be enabled I think there should still be a single patch call with all necessary FGs.
The FG names can be aggregated and then passed to the patch call.
I didnt exported the FG name to var because its the only one at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should not come as a hard dependency of the SR-IOV provider.
Do you see a problem with directly marking the need to have CPUManager as input from the caller?

Also, the patching is odd to me too.

  • Why do you use replace and not a simple add?
  • Why do you think it is better to assume there is only one FG? It will just make it harder for the next contributor to add other FGs in general.

@ormergi
Copy link
Contributor Author

ormergi commented Jan 19, 2025

Thank you for the PR @ormergi. Could you please give a few words about why is the PR needed?

Done

@ormergi
Copy link
Contributor Author

ormergi commented Jan 19, 2025

/test check-up-kind-sriov

@orelmisan
Copy link
Member

Thank you for the PR @ormergi. Could you please give a few words about why is the PR needed?

Done

Thank you.
Could you please add it to the commit message as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/enhancement size/XS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants