Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vpc.max.networks setting & settings to limit the number of NICs for each hypervisor #8654

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

hsato03
Copy link
Collaborator

@hsato03 hsato03 commented Feb 14, 2024

Description

Each hypervisor can support a different number of network adapters. Comparing KVM and VMWare, VMWare defines a limited number of NICs for each ESXi machine (https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%207.0&categories=1-0), while the number of tiers that can be allocated using KVM depends on the number of PCI slots availabe. For example, KVM provides 32 PCI slots, which are used to connect several devices, e.g. CD-ROM, keyboard, etc. Every ACS VR already consumes 9 slots of the 32 available; thus, in KVM we can have 23 slots for new tiers to be added.

This PR updates the vpc.max.networks setting to ConfigKey and changes its scope to the cluster level, as the maximum number of networks per VPC depends on the hypervisor where the VR is deployed.

The setting value is defined based on the cluster in which the VPC's VR is running, using the lowest value found if the VPC has more than one VR and they are in different clusters. If the VPC does not have a VR, the value defined in the global setting is used.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I changed the vpc.max.networks value in the following resources:

  • Global setting: 3;
  • cluster-test: 7;
  • cluster-test2: 5.

I created 3 VPCs:

  • VPC1: without VRs;
  • VPC2: VR within cluster-test;
  • VPC3: VRs within cluster-test and cluster-test2.

Then, I verfiied that the vpc.max.networks setting value was:

  • 3 in VPC1;
  • 7 in VPC2;
  • 5 in VPC3.

Copy link

codecov bot commented Feb 14, 2024

Codecov Report

Attention: Patch coverage is 88.73239% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 30.83%. Comparing base (12f65fb) to head (63ab94a).
Report is 5 commits behind head on main.

Files Patch % Lines
...ain/java/com/cloud/network/vpc/VpcManagerImpl.java 72.72% 4 Missing and 2 partials ⚠️
...tack/engine/orchestration/NetworkOrchestrator.java 94.44% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8654      +/-   ##
============================================
- Coverage     31.04%   30.83%   -0.21%     
+ Complexity    33902    33647     -255     
============================================
  Files          5404     5405       +1     
  Lines        380305   380371      +66     
  Branches      55506    55519      +13     
============================================
- Hits         118056   117298     -758     
- Misses       246496   247468     +972     
+ Partials      15753    15605     -148     
Flag Coverage Δ
simulator-marvin-tests 24.42% <60.56%> (-0.35%) ⬇️
uitests 4.34% <ø> (ø)
unit-tests 16.89% <76.05%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@weizhouapache
Copy link
Member

@hsato03

I agree this setting should by dynamic, and it is better to be a zone-level setting. But, vpc does not belong to a cluster and vms in vpc might run with different hypervisors, in different clusters. IMHO, It does not make sense to be a cluster-level setting.

@hsato03
Copy link
Collaborator Author

hsato03 commented Feb 14, 2024

@weizhouapache

I agree this setting should by dynamic, and it is better to be a zone-level setting.

Since the maximum number of networks per VPC depends on the hypervisor where the VR is deployed (as pointed out in the description), I don't think it makes sense to change it to the zone level as the cluster is the highest possible structure to define a hypervisor.

But, vpc does not belong to a cluster and vms in vpc might run with different hypervisors, in different clusters.

The setting value is defined by the cluster in which the VPC's VR is running, using the lowest value found if the VPC has more than one VR and they are in different clusters (https://github.com/apache/cloudstack/pull/8654/files#diff-07bd71d9f58832d4429d7743f4887188a96aacc913dc48e0101470147ce42032R1893-R1922). Regarding and vms in vpc might run with different hypervisors, VMs running in different hypervisors do not impact in the VPC creation; therefore, does not make sense to consider it in the algorithm.

@weizhouapache
Copy link
Member

@weizhouapache

I agree this setting should by dynamic, and it is better to be a zone-level setting.

Since the maximum number of networks per VPC depends on the hypervisor where the VR is deployed (as pointed out in the description), I don't think it makes sense to change it to the zone level as the cluster is the highest possible structure to define a hypervisor.

But, vpc does not belong to a cluster and vms in vpc might run with different hypervisors, in different clusters.

The setting value is defined by the cluster in which the VPC's VR is running, using the lowest value found if the VPC has more than one VR and they are in different clusters (https://github.com/apache/cloudstack/pull/8654/files#diff-07bd71d9f58832d4429d7743f4887188a96aacc913dc48e0101470147ce42032R1893-R1922). Regarding and vms in vpc might run with different hypervisors, VMs running in different hypervisors do not impact in the VPC creation; therefore, does not make sense to consider it in the algorithm.

I can understand your code.
Your requirement is not clear to me. Can you explain more about the infrastructure? e.g. the hypervisor types, why the max vpc networks are different, what are the issues? Can it be a domain or account setting?

@hsato03
Copy link
Collaborator Author

hsato03 commented Feb 15, 2024

I can understand your code. Your requirement is not clear to me. Can you explain more about the infrastructure? e.g. the hypervisor types, why the max vpc networks are different, what are the issues? Can it be a domain or account setting?

@weizhouapache Each hypervisor can support a different number of network adapters. Comparing KVM and VMWare, VMWare defines a limited number of NICs for each ESXi machine (https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%207.0&categories=1-0), while the number of tiers that can be allocated using KVM depends on the number of PCI slots availabe. For example, KVM provides 32 PCI slots, which are used to connect several devices, e.g. CD-ROM, keyboard, etc. Every ACS VR already consumes 9 slots of the 32 available; thus, in KVM we can have 23 slots for new tiers to be added.

Therefore, in an environment with KVM and VMware clusters under the same zone, applying the VMware limit to KVM is not interesting, as a VPC in KVM supports way more tiers than in VMware.

I will update the PR's description to make it clearer.

@weizhouapache
Copy link
Member

I can understand your code. Your requirement is not clear to me. Can you explain more about the infrastructure? e.g. the hypervisor types, why the max vpc networks are different, what are the issues? Can it be a domain or account setting?

@weizhouapache Each hypervisor can support a different number of network adapters. Comparing KVM and VMWare, VMWare defines a limited number of NICs for each ESXi machine (https://configmax.esp.vmware.com/guest?vmwareproduct=vSphere&release=vSphere%207.0&categories=1-0), while the number of tiers that can be allocated using KVM depends on the number of PCI slots availabe. For example, KVM provides 32 PCI slots, which are used to connect several devices, e.g. CD-ROM, keyboard, etc. Every ACS VR already consumes 9 slots of the 32 available; thus, in KVM we can have 23 slots for new tiers to be added.

Therefore, in an environment with KVM and VMware clusters under the same zone, applying the VMware limit to KVM is not interesting, as a VPC in KVM supports way more tiers than in VMware.

I will update the PR's description to make it clearer.

@hsato03
makes perfect sense.

However ...
the number of nics of VPC VRs contains

  • 1 control NIC
  • multiple guest NICs
  • multiple public NICs

I would suggest you to create a setting for each hypervisor, e.g. virtual.machine.max.nics.kvm and virtual.machine.max.nics.vmware which consider all type of NICs, and check the value when

  • add guest NIC to VR (of VPC and isolated networks, and even Shared networks)
  • add public NICs to VR (of VPC and isolated networks, and even Shared networks)
  • add NICs to user VMs

@weizhouapache
Copy link
Member

@hsato03
What do you think?
The limitations apply on all vms, including user vms and VRs.

@hsato03
makes perfect sense.

However ...
the number of nics of VPC VRs contains

  • 1 control NIC
  • multiple guest NICs
  • multiple public NICs

I would suggest you to create a setting for each hypervisor, e.g. virtual.machine.max.nics.kvm and virtual.machine.max.nics.vmware which consider all type of NICs, and check the value when

  • add guest NIC to VR (of VPC and isolated networks, and even Shared networks)
  • add public NICs to VR (of VPC and isolated networks, and even Shared networks)
  • add NICs to user VMs

@hsato03
Copy link
Collaborator Author

hsato03 commented Feb 21, 2024

@weizhouapache Thanks for your suggestion.

I agree that this situation should include VMs and VRs but the vpc.max.networks setting should still exist if the user wants to limit the number of tiers in a VPC regardless of the hypervisor. What do you think about delegating the limit checking of a hypervisor based on the settings you suggested and changing the scope of the vpc.max.networks to the account level in this case?

@weizhouapache
Copy link
Member

@weizhouapache Thanks for your suggestion.

I agree that this situation should include VMs and VRs but the vpc.max.networks setting should still exist if the user wants to limit the number of tiers in a VPC regardless of the hypervisor.

agree.

What do you think about delegating the limit checking of a hypervisor based on the settings you suggested and changing the scope of the vpc.max.networks to the account level in this case?

It looks good. Looking forward to your changes

@hsato03 hsato03 marked this pull request as draft February 23, 2024 15:19
Copy link

github-actions bot commented Mar 8, 2024

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@hsato03 hsato03 changed the title Update vpc.max.networks setting Update vpc.max.networks setting & settings to limit the number of NICs for each hypervisor Mar 19, 2024
@hsato03 hsato03 marked this pull request as ready for review March 21, 2024 19:52
@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 9570

@hsato03
Copy link
Collaborator Author

hsato03 commented May 31, 2024

@blueorangutan package

@blueorangutan
Copy link

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 9751

Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@DaanHoogland
Copy link
Contributor

@hsato03

21:06:02 [INFO] -------------------------------------------------------------
21:06:07 [ERROR] COMPILATION ERROR : 
21:06:07 [INFO] -------------------------------------------------------------
21:06:07 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.0.0-SNAPSHOT/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java:[1143,15] error: incompatible types: HypervisorType cannot be converted to int
21:06:07 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.0.0-SNAPSHOT/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java:[1144,17] error: cannot find symbol
21:06:07   symbol:   variable KVM
21:06:07   location: class NetworkOrchestrator
21:06:07 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.0.0-SNAPSHOT/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java:[1147,17] error: cannot find symbol
21:06:07   symbol:   variable VMware
21:06:07   location: class NetworkOrchestrator
21:06:07 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.0.0-SNAPSHOT/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java:[1150,17] error: cannot find symbol
21:06:07   symbol:   variable XenServer
21:06:07   location: class NetworkOrchestrator
21:06:07 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.0.0-SNAPSHOT/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java:[1167,15] error: incompatible types: HypervisorType cannot be converted to int
21:06:07 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.0.0-SNAPSHOT/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java:[1168,17] error: cannot find symbol
21:06:07   symbol:   variable KVM
21:06:07   location: class NetworkOrchestrator
21:06:07 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.0.0-SNAPSHOT/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java:[1171,17] error: cannot find symbol
21:06:07   symbol:   variable VMware
21:06:07   location: class NetworkOrchestrator
21:06:07 [ERROR] /jenkins/workspace/acs-centos7-pkg-builder/dist/rpmbuild/BUILD/cloudstack-4.20.0.0-SNAPSHOT/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java:[1174,17] error: cannot find symbol
21:06:07   symbol:   variable XenServer
21:06:07   location: class NetworkOrchestrator
21:06:07 [INFO] 8 errors

I think it would be a merge problem, but notice that the GHA build also fails with this error: https://github.com/apache/cloudstack/actions/runs/8935283845/job/24543531187?pr=8654#step:7:18232

@hsato03
Copy link
Collaborator Author

hsato03 commented Jul 11, 2024

@blueorangutan package

@blueorangutan
Copy link

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10329

@JoaoJandre
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10457

@JoaoJandre
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 10643

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 10662

Copy link

github-actions bot commented Sep 6, 2024

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@hsato03
Copy link
Collaborator Author

hsato03 commented Oct 16, 2024

@blueorangutan package

@blueorangutan
Copy link

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11368

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11710)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 56931 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8654-t11710-kvm-ol8.zip
Smoke tests completed. 140 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_07_vpc_and_tier_with_routed_mode Error 2.34 test_ipv4_routing.py
test_09_connectivity_between_network_and_vpc_tier Failure 0.12 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 7.80 test_ipv4_routing.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Error 412.71 test_vpc_redundant.py

@DaanHoogland
Copy link
Contributor

[SF] Trillian test result (tid-11710) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 56931 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8654-t11710-kvm-ol8.zip Smoke tests completed. 140 look OK, 2 have errors, 0 did not run Only failed and skipped tests results shown below:
Test Result Time (s) Test File
test_07_vpc_and_tier_with_routed_mode Error 2.34 test_ipv4_routing.py
test_09_connectivity_between_network_and_vpc_tier Failure 0.12 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 7.80 test_ipv4_routing.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Error 412.71 test_vpc_redundant.py

@hsato03 these errors seem related to the change, an you have a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants