Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.17] SDN-4919,OCPBUGS-39200: 4.18 merge - 5th Sept #2291

Open
wants to merge 52 commits into
base: release-4.17
Choose a base branch
from

Conversation

martinkennelly
Copy link
Contributor

/cc
/hold

TBD

arghosh93 and others added 30 commits August 9, 2024 17:21
This is to change POD and join subnet used with couple of net-attach-def
in unit tests to satisfy newly introduced subnet overlap check with
ClusterNetwork, ServiceNetwork, join switch and masquerade CIDR.

Signed-off-by: Arnab Ghosh <[email protected]>
UDN API referance generated using the following command:
  crd-ref-docs --source-path ./go-controller/pkg/crd/userdefinednetwork --config=crd-docs-config.yaml --renderer=markdown --output-path=./docs/api-reference/userdefinednetwork-api-spec.md

Signed-off-by: Or Mergi <[email protected]>
The new OVS version is used by the OVN observability.

Signed-off-by: Nadia Pinaeva <[email protected]>
Signed-off-by: Surya Seetharaman <[email protected]>
UDN: Add `MASQUERADE` IPTable Rules
OCPBUGS-38270: Dockerfile: Bump OVS to 3.4.0-1
UDN: allow multiple conditions from different fieldManagers to co-exist in the status.
…nagement-port

UDN: Add RPFilter Loose Mode for management port
Everytime a UDN was created, we were adding the all remote nodes for
every network all over again, including the default network. This makes
the checks on the annotations network aware.

Signed-off-by: Tim Rozet <[email protected]>
Services controller:
- move it to base network controller
- start one services controller per primary network
- set up filter in the informer so that only endpointslices for the given network are considered
- pass switch and router names according to the network for a given node.

Move getActiveNetworkForNamespace to CommonNetworkControllerInfo, because the services controller only has access to CommonNetworkControllerInfo at initialization and needs to run getActiveNetworkForNamespace.

Make LBs and LB groups network scoped

Add network name & role to OVN external IDs. In a few places in the code we retrieve all logical switches, routers and load balancers to initialize the services controller or to delete stale entries. With one services controller per network, the OVN lookup must only return OVN elements in the network we're interested in. This is achieved by adding the network name and network role (default, primary, secondary) to the ExternalIDs field of logical switches, routers and load balancers.

Signed-off-by: Riccardo Ravaioli <[email protected]>
The existing unit tests for services in services_controller_test are now run for UDN as well.

At the same time, a cleanup of unit tests was needed, especially since there was a lot of repetition in the surrounding code, also with respect to global and test-specific variables between services_controller_test.go and lb_config_test.go

Finally, Test_ETPCluster_NodePort_Service_WithMultipleIPAddresses follows the exact same logic found in TestSyncServices, so let's move it there

Signed-off-by: Riccardo Ravaioli <[email protected]>
Allows the execution of the network segmentation tests that are in network_segmentation_*.go (e.g. services, endpoint slice mirrorring). For instance:

make control-plane WHAT="Network Segmentation: services"

Signed-off-by: Riccardo Ravaioli <[email protected]>
The test creates a client and nodeport service in a UDN backed by one pod and similarly
a nodeport service and a client in the default network.
We verify that:
- UDN client --> UDN service, with backend pod and client running on the same node, is possible through:
  + clusterIP
  + nodeIP:nodePort, where we only target the node where the client runs (*)

- UDN client --> UDN service, with backend pod and client running on different nodes, is possible through:
  + clusterIP
  + nodeIP:nodePort, where we only target the node where the client runs (*)

- default-network client --> UDN service is NOT possible through:
  + clusterIP
  + nodeIP:nodePort, where we only target the node where the client runs (*)

-  UDN service --> default-network client is NOT possible through:
  + clusterIP
  + nodeIP:nodePort, where we only target the node where the client runs (*)

(*) TODO connect to other nodes too once ovnkube-node fully supports UDN

TODO: use the same logic as in network_segmentation.go

Signed-off-by: Riccardo Ravaioli <[email protected]>
Signed-off-by: Jaime Caamaño Ruiz <[email protected]>
Use faked iptables in UDN gateway tests
Update Dockerfile.fedora to use pre-released 24.09 ovn rpm.
Fixes remote node checks to be network aware
UDN layer 3 networks also have a join switch and gateway router.

Signed-off-by: Dumitru Ceara <[email protected]>
In the "delete" case we don't need the cookie, move the code that builds
the cookie after the section that checks and takes care of deletes.

Signed-off-by: Dumitru Ceara <[email protected]>
… namespace active network

Signed-off-by: Dumitru Ceara <[email protected]>
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 5, 2024
Copy link
Contributor

openshift-ci bot commented Sep 5, 2024

@martinkennelly: GitHub didn't allow me to request PR reviews from the following users: martinkennelly.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc
/hold

TBD

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 5, 2024
@martinkennelly martinkennelly changed the title SDN-4919: Downstream Merge 28th August [release-4.17] SDN-4919,OCPBUGS-39200: UDN Merge + OVS bump 5th Sept Sep 5, 2024
@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Sep 5, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 5, 2024

@martinkennelly: This pull request references SDN-4919 which is a valid jira issue.

This pull request references Jira Issue OCPBUGS-39200, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-39200 to depend on a bug targeting a version in 4.18.0 and in one of the following states: MODIFIED, ON_QA, VERIFIED, but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/cc
/hold

TBD

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Sep 5, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: martinkennelly
Once this PR has been reviewed and has the lgtm label, please assign abhat for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

OCPBUGS-39157,SDN-4930: Downstream Merge Sept 4th
@martinkennelly
Copy link
Contributor Author

/test e2e-gcp-ovn-techpreview

Init issue

@martinkennelly
Copy link
Contributor Author

/retest

no issue that seem related to this PR

@martinkennelly martinkennelly changed the title [release-4.17] SDN-4919,OCPBUGS-39200: UDN Merge + OVS bump 5th Sept [release-4.17] SDN-4919,OCPBUGS-39200: 4.18 merge - 5th Sept Sep 5, 2024
@martinkennelly
Copy link
Contributor Author

Get the following unit test failures:

Summarizing 3 Failures:

[Fail] Gateway Init Operations Setting up the gateway bridge [It] sets up a local gateway with predetermined interface 
/go/src/github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node/gateway_init_linux_test.go:1269

[Fail] Gateway Init Operations Setting up the gateway bridge [It] sets up a local gateway with predetermined interface when network-segmentation is enabled 
/go/src/github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node/gateway_init_linux_test.go:1269

[Fail] Gateway Init Operations Setting up the gateway bridge [It] sets up a local gateway with predetermined interface and no default route 
/go/src/github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node/gateway_init_linux_test.go:1269

Ran 155 of 155 Specs in 78.177 seconds
FAIL! -- 152 Passed | 3 Failed | 0 Pending | 0 Skipped

Will investigate tomorrow.

@tssurya
Copy link
Contributor

tssurya commented Sep 6, 2024

/retest

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-ipv6-techpreview

@martinkennelly
Copy link
Contributor Author

/payload 4.17 ci blocking
/payload 4.17 nightly blocking

Copy link
Contributor

openshift-ci bot commented Sep 6, 2024

@martinkennelly: trigger 4 job(s) of type blocking for the ci release of OCP 4.17

  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cf631b60-6c35-11ef-8dc0-278a6fec7ca9-0

trigger 9 job(s) of type blocking for the nightly release of OCP 4.17

  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-nightly-4.17-fips-payload-scan
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.17-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cf631b60-6c35-11ef-8dc0-278a6fec7ca9-1

@martinkennelly
Copy link
Contributor Author

/test e2e-azure-ovn-upgrade

Test passed but exceeded 4 hours and got killed.

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2024-09-06T10:46:20Z"}
INFO[2024-09-06T10:46:20Z] Received signal.                              signal=interrupt
INFO[2024-09-06T10:46:20Z] error: Process interrupted with signal interrupt, cancelling execution... 

@martinkennelly
Copy link
Contributor Author

/test 4.17-upgrade-from-stable-4.16-e2e-aws-ovn-upgrade

ditto

@martinkennelly
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-techpreview

nothing indicating this PR is causing failure.

@martinkennelly
Copy link
Contributor Author

payload jobs are good

@martinkennelly
Copy link
Contributor Author

martinkennelly commented Sep 6, 2024

Metal tech preview continues to fail in the same manner - nothing obvious sticking out.

@martinkennelly
Copy link
Contributor Author

/test e2e-azure-ovn-upgrade

INFO[2024-09-06T15:21:31Z] Running step e2e-azure-ovn-upgrade-ipi-deprovision-deprovision. {"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2024-09-06T15:23:16Z"}

No tests failed.

@martinkennelly
Copy link
Contributor Author

/test e2e-azure-ovn-upgrade

{ ["e2e-azure-ovn-upgrade" pod "e2e-azure-ovn-upgrade-openshift-e2e-test" failed: could not watch pod: context canceled Link to step on registry info site: https://steps.ci.openshift.org/reference/openshift-e2e-test Link to job on registry info site: https://steps.ci.openshift.org/job?org=openshift&repo=ovn-kubernetes&branch=release-4.17&test=e2e-azure-ovn-upgrade, cancelled]}

@tssurya
Copy link
Contributor

tssurya commented Sep 10, 2024

/hold
don't merge this till we get the bug in services fixed (contact Ricky for details, martin)

@martinkennelly
Copy link
Contributor Author

/test e2e-azure-ovn-upgrade

Test reach timelimit of 4 hours. No failures reported but unsure if its just unreported.
Looked at ovn-k and no issues seen in mg logs.

Copy link
Contributor

openshift-ci bot commented Sep 10, 2024

@martinkennelly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security 129a097 link false /test security
ci/prow/e2e-metal-ipi-ovn-techpreview 129a097 link false /test e2e-metal-ipi-ovn-techpreview
ci/prow/e2e-aws-ovn-kubevirt 129a097 link false /test e2e-aws-ovn-kubevirt
ci/prow/e2e-azure-ovn-upgrade 129a097 link true /test e2e-azure-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@martinkennelly
Copy link
Contributor Author

Waiting for ovn-org/ovn-kubernetes#4734

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.