Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-1248: port {azure,gcp}-routes.sh from iptables to nftables #4494

Merged
merged 4 commits into from
Sep 9, 2024

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Jul 26, 2024

iptables is going away in RHEL10; this PR ports {azure,gcp}-routes.sh from iptables to nftables.

What's different:

  • The old azure script created separate iptables and ip6tables rules, but the nftables version just uses the inet family so we can put both sets of rules together (and we don't need to check if IPv6 is enabled or not; the IPv6 rule will just never match anything if IPv6 is disabled.)
  • The old version used an iptables rule per VIP, the new version uses a single nftables rule that does a set match against all the VIPs at once.
  • The old version did "figure out which rules don't belong, then remove then, then add new rules", while the new version does "atomically overwrite the old rules with the new rules".

Other than that (and the commit that removes an unnecessary ACCEPT rule), there are no changes in functionality; I had previously started an attempt to refactor them first (#3619, #3673, #3674, #3675) but wasn't having much luck with that, and we need to get rid of iptables, so...

Copy link
Contributor

openshift-ci bot commented Jul 26, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 26, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2024
@danwinship danwinship changed the title port {alibabacloud,azure,gcp}-routes.sh from iptables to nftables port {azure,gcp}-routes.sh from iptables to nftables Jul 26, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2024
@danwinship
Copy link
Contributor Author

/test all
/test

Copy link
Contributor

openshift-ci bot commented Jul 26, 2024

@danwinship: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test 4.12-upgrade-from-stable-4.11-images
  • /test cluster-bootimages
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-upgrade
  • /test e2e-gcp-op
  • /test e2e-gcp-op-single-node
  • /test e2e-hypershift
  • /test images
  • /test unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade
  • /test bootstrap-unit
  • /test e2e-aws-disruptive
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-fips-op
  • /test e2e-aws-ovn-upgrade-out-of-change
  • /test e2e-aws-ovn-workers-rhel8
  • /test e2e-aws-proxy
  • /test e2e-aws-serial
  • /test e2e-aws-single-node
  • /test e2e-aws-upgrade-single-node
  • /test e2e-aws-workers-rhel8
  • /test e2e-azure
  • /test e2e-azure-ovn-upgrade
  • /test e2e-azure-ovn-upgrade-out-of-change
  • /test e2e-azure-upgrade
  • /test e2e-gcp-op-techpreview
  • /test e2e-gcp-ovn-rt-upgrade
  • /test e2e-gcp-rt
  • /test e2e-gcp-rt-op
  • /test e2e-gcp-single-node
  • /test e2e-gcp-upgrade
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack
  • /test e2e-openstack-dualstack
  • /test e2e-openstack-externallb
  • /test e2e-openstack-parallel
  • /test e2e-ovirt
  • /test e2e-ovirt-upgrade
  • /test e2e-ovn-step-registry
  • /test e2e-vsphere
  • /test e2e-vsphere-ovn-upi
  • /test e2e-vsphere-ovn-upi-zones
  • /test e2e-vsphere-ovn-zones
  • /test e2e-vsphere-upgrade
  • /test okd-e2e-aws
  • /test okd-e2e-gcp-op
  • /test okd-e2e-upgrade
  • /test okd-e2e-vsphere
  • /test okd-images
  • /test okd-scos-images
  • /test security

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-machine-config-operator-master-bootstrap-unit
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-upgrade-out-of-change
  • pull-ci-openshift-machine-config-operator-master-e2e-azure-ovn-upgrade-out-of-change
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-techpreview
  • pull-ci-openshift-machine-config-operator-master-e2e-hypershift
  • pull-ci-openshift-machine-config-operator-master-e2e-vsphere-ovn-upi
  • pull-ci-openshift-machine-config-operator-master-e2e-vsphere-ovn-upi-zones
  • pull-ci-openshift-machine-config-operator-master-e2e-vsphere-ovn-zones
  • pull-ci-openshift-machine-config-operator-master-images
  • pull-ci-openshift-machine-config-operator-master-security
  • pull-ci-openshift-machine-config-operator-master-unit
  • pull-ci-openshift-machine-config-operator-master-verify

In response to this:

/test all
/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@danwinship
Copy link
Contributor Author

/test all

@danwinship danwinship changed the title port {azure,gcp}-routes.sh from iptables to nftables WIP: port {azure,gcp}-routes.sh from iptables to nftables Jul 27, 2024
@danwinship danwinship marked this pull request as ready for review July 27, 2024 11:57
@danwinship
Copy link
Contributor Author

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade 5

Copy link
Contributor

openshift-ci bot commented Jul 27, 2024

@danwinship: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fa4ddbd0-4c3c-11ef-8c5d-c73ffa4621dc-0

@danwinship
Copy link
Contributor Author

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade 6
/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 6

Copy link
Contributor

openshift-ci bot commented Jul 27, 2024

@danwinship: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/91e1b5e0-4c5e-11ef-9aa0-de06f78ca57b-0

Change them to say "OVN-Kubernetes is not running" rather than "Plugin
is SDN", since the plugin is never SDN any more, and the script gets
run before ovn-k starts up anyway, so previously it was logging
"Plugin is SDN" for a while at startup.

Also remove the log message from remove_stale_routes, since that
function is always run along with add_routes, so we don't need to log
them same thing in both of them.
gcp-routes had a rule "so that existing flows (with an entry in
conntrack) continue to be balanced, even if the DNAT entry is removed"
(which was then also copied into azure-routes and alibaba-routes). The
only way this iptables rule would actually be needed is if (a) your
masters have an iptables-based firewall (which they shouldn't, on
OCP), and (b) the firewall is so aggressive that it even drops packets
from established connections (which no firewall should do anyway).

At any rate, even if the rule *was* necessary in some clusters, it
won't work in future nftables-only versions of RHCOS anyway, because
nftables doesn't let "accept" rules in one table override
"drop"/"reject" rules in another table; if your firewall is broken and
dropping packets that it shouldn't, you have to actually fix your
firewall rules, not hack around them somewhere else.
@danwinship danwinship changed the title WIP: port {azure,gcp}-routes.sh from iptables to nftables MCO-1248: port {azure,gcp}-routes.sh from iptables to nftables Jul 29, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 29, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 29, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 29, 2024

@danwinship: This pull request references MCO-1248 which is a valid jira issue.

In response to this:

iptables is going away in RHEL10; this PR ports {azure,gcp}-routes.sh from iptables to nftables.

What's different:

  • The old azure script created separate iptables and ip6tables rules, but the nftables version just uses the inet family so we can put both sets of rules together (and we don't need to check if IPv6 is enabled or not; the IPv6 rule will just never match anything if IPv6 is disabled.)
  • The old version used an iptables rule per VIP, the new version uses a single nftables rule that does a set match against all the VIPs at once.
  • The old version did "figure out which rules don't belong, then remove then, then add new rules", while the new version does "atomically overwrite the old rules with the new rules".

Other than that (and the commit that removes an unnecessary ACCEPT rule), there are no changes in functionality; I had previously started an attempt to refactor them first (#3619, #3673, #3674, #3675) but wasn't having much luck with that, and we need to get rid of iptables, so...

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@danwinship
Copy link
Contributor Author

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 6

1 similar comment
@danwinship
Copy link
Contributor Author

/payload-aggregate periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade 6

Copy link
Contributor

openshift-ci bot commented Jul 30, 2024

@danwinship: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7163e640-4e74-11ef-85b3-ebdf6b3ccdbe-0

@danwinship
Copy link
Contributor Author

ok, gcp payload passed in the first run, azure payload passed in the second run (after failing all 6 times in the original run because of a bug, showing that the payload test does actually depend on this script working)

@jcaamano
Copy link
Contributor

jcaamano commented Aug 6, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 6, 2024
@danwinship
Copy link
Contributor Author

/assign @djoshy

@djoshy
Copy link
Contributor

djoshy commented Aug 7, 2024

/approve

I'll admit this is not my domain so I'll defer to @jcaamano. The payloads look good too. Holding in case this requires QE testing, but feel free to unhold if not required.

Copy link
Contributor

openshift-ci bot commented Aug 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, djoshy, jcaamano

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2024
@djoshy
Copy link
Contributor

djoshy commented Aug 7, 2024

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 7, 2024
@danwinship
Copy link
Contributor Author

/hold cancel

The failure mode for this is that a very small percentage of apiserver connections would fail during upgrades, which we have checks for in the upgrade jobs, but which is hard to notice in human testing (since the failed connections would get retried so everything still works, just a little bit more slowly than it should have). So I think the payload jobs validate this better than QE could.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 8, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 217f607 and 2 for PR HEAD 6989891 in total

@danwinship
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 9c7a000 and 1 for PR HEAD 6989891 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 2b5d682 and 0 for PR HEAD 6989891 in total

@openshift-ci-robot
Copy link
Contributor

/hold

Revision 6989891 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 9, 2024
@danwinship
Copy link
Contributor Author

/hold cancel
/retest-required
infra flakes

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 9, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD da94f1b and 2 for PR HEAD 6989891 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 6caab8b and 1 for PR HEAD 6989891 in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 16d393f and 0 for PR HEAD 6989891 in total

@openshift-ci-robot
Copy link
Contributor

/hold

Revision 6989891 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2024
@djoshy
Copy link
Contributor

djoshy commented Sep 9, 2024

/unhold
/retest-required

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 9, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit f1e1137 into openshift:master Sep 9, 2024
13 of 17 checks passed
@danwinship danwinship deleted the nftables branch September 9, 2024 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants