Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay.d: add 35coreos-iptables #1324

Merged
merged 3 commits into from
Feb 16, 2022
Merged

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Nov 3, 2021

From overlay.d/README.md:

Contains systemd service and script for remaining on iptables-nft after
the migration to nft.

Split out because (1) it will roll out to next first, and (2) it can
more easily be deleted after the barrier release.

For more details, see:
coreos/fedora-coreos-tracker#676
#1324

@jlebon
Copy link
Member Author

jlebon commented Nov 3, 2021

Need to add tests for this. But also, we can't merge this until we actually officially start the deprecation window and wait the 3 months.

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @jlebon. A few comments/questions:

LEGACY_TARGETS="ip6tables-legacy ip6tables-legacy-restore ip6tables-legacy-save iptables-legacy iptables-legacy-restore iptables-legacy-save"

# sanity-check the stamp file is present
if [ ! -e /sysroot/etc/fedora-coreos/iptables-legacy.stamp ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.. We're starting to use the /etc/fedora-coreos folder for things now. I really think we should create a docs page for everything we've ever put in this folder (i..e will be useful in the future) that explains or links to reasonings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! Let's try to find someone to tackle this bit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in 05core though. Should it be in 15fcos instead? If not, maybe we should be using /etc/coreos here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration specifically, RHCOS has already done it. Though I like the idea of keeping the door open for things that do also affect RHCOS. So leaning towards using /etc/coreos, and putting the specific things that leverage the directory in either 05core or 15fcos depending on whether it needs to apply to RHCOS too (so this one for example, would go in 15fcos). Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

Comment on lines 2 to 5
Description=CoreOS Enable iptables-legacy
ConditionPathExists=/etc/initrd-release
DefaultDependencies=false
ConditionPathExists=/sysroot/etc/fedora-coreos/iptables-legacy.stamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we limit this unit to running only once? It only needs to run once, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good question. I think it would be less confusing if it's "bound" to the stamp file that already exists instead of creating another one. One case I can imagine is someone upgrading to an iptables-nft-default release, then deploying back an iptables-legacy-default release and then upgrading again. If we ran once only, we wouldn't reapply the legacy backend, but we should.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of torn on this one. I like the use of a systemd unit like this for "migration" or "run once" type stuff but not as much for an every boot kind of thing. For example, what if someone decides in the future they want to switch to using NFT and they log into their systems and fix the symlinks themselves (not realizing /sysroot/etc/fedora-coreos/iptables-legacy.stamp) exists. After a reboot they're back to legacy iptables.

This is a contrived example, but what I'm trying to argue for is not having 2nd order configuration (i.e. configuration that dynamically controls other configuration) that runs every boot. It starts to get confusing real quick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that I don't want to make this conditional on another stamp file. So maybe it's simplest if we just have the script delete it once it's done. Then it's naturally one shot. It also solves my other concern of the stamp file living forever and possibly not being in sync with what's in /etc.

Copy link
Member

@dustymabe dustymabe Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I actually kind of like having a second stamp file :). Mainly because there is one that was the original request (requesting for the service to do something) and there is a second one to let us know the service ran (in case the journal had been wrapped and we no longer have that information in journalctl). I agree it's not great the have the stamp files hang around forever, but then again it could be nice (given our new use of /etc/fedora-coreos/) to add a datestamp (or some other time indicator) to the stamp files (i.e. /etc/fedora-coreos/202203-iptables-legacy.stamp) and we can just let them collect, almost like a little log of system history. Given that we periodically have migrations like this, it could be useful to see them all in one place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this and I think we might be tackling this slightly wrong. This is introducing an API that allows new installs to default to iptables-legacy using a stamp file. Do we want to commit to that? For how long? I think instead it should be a documented Butane snippet setting the symlinks (or if we really want, some Butane sugar for it).

So then this here would be concerned strictly about upgrading nodes. In which case we can delete this script after the next barrier.

Re. keeping the stamp file in /etc as a "log", I'm still not convinced. :) But we could also add a stamp file in /var/lib as a marker that we ran if we really want something more permanent (but note also that ostree admin config-diff is the final source of truth for what local changes have been made anyway).

@bgilbert Any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on not mutating configuration on every boot. Also agree that this is currently introducing an API with unbounded lifetime, sort of an "anti-feature flag". Given the lack of consensus on the feature flags proposal in coreos/fedora-coreos-tracker#892, I think it'd be better to avoid that. The same reasoning mitigates against Butane sugar. A documented Butane snippet for new installs seems reasonable to me; it's consistent with other migrations we've done in the past.

As to retaining the upgrade stamp forever, I'm also not convinced. The symlinks themselves are the source of truth for which mode we're in, and how we got there seems less relevant. When old journal entries age out, by definition that's an indication that we no longer care about the corresponding slice of historical record, so I don't think we need to go out of our way to stash the info somewhere else.

@dustymabe
Copy link
Member

as part of this PR let's also update the iptables-legacy test (it should now create the stamp file via Ignition) and add a new test for iptables-nft. The iptables-nft test should be able to be run on both FCOS and RHCOS.

@jlebon jlebon force-pushed the pr/iptables-nft branch 2 times, most recently from bccb410 to 278b605 Compare November 9, 2021 21:29
@jlebon
Copy link
Member Author

jlebon commented Nov 9, 2021

Updated this, but haven't sanity-checked manually myself. Let's see what CI says.

travier
travier previously approved these changes Nov 12, 2021
@jlebon
Copy link
Member Author

jlebon commented Dec 6, 2021

Rebased and tweaked slightly based on discussions. Still need to address feedback from #1324 (comment).

@jlebon jlebon force-pushed the pr/iptables-nft branch 2 times, most recently from f47ce46 to 116c83c Compare December 7, 2021 15:56
@jlebon
Copy link
Member Author

jlebon commented Dec 7, 2021

Now with #1324 (comment) addressed! I've also updated the email draft to reflect the latest approach. Let's try to send it this week?

local symlink=$1; shift
# check that the deployment is still using the symlink (i.e. the user didn't
# do something funky), and that the OSTree default is still symlink-based
# (i.e. that we didn't change strategy and forgot to update this script)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# (i.e. that we didn't change strategy and forgot to update this script)

Should we hard error in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy to tell those two cases apart though (this is related to the other comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also in general, I think for this we should try to be really good at detecting the state we expect, but otherwise just leave everything untouched without alarming the user. These updating machines could be ancient and carry a lot of state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think I was mostly just assuming if this hard fails we'll catch it in CI and resolve the problem. I agree, though, that if we wanted that it should be part of a CI test and not part of these units.

Comment on lines 22 to 42
echo "Executable $path no longer present; exiting."
exit 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this should be a hard error. If this is true we failed IMO. Maybe we should add an appropriate check in our kola tests to make sure this is never true... Or I guess iptables --version | grep legacy (in the kola test already) is sufficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to be resilient to e.g. some layered package or some customization having changed the tree (or e.g. with coreos/enhancements#7, that'd be trivially possible though this script probably won't live that long).

Or I guess iptables --version | grep legacy (in the kola test already) is sufficient?

👍 Agreed.

manifests/fedora-coreos-base.yaml Outdated Show resolved Hide resolved
@dustymabe
Copy link
Member

Carrying forward the converstaion from this thread in #1324 (comment) you said This is introducing an API that allows new installs to default to iptables-legacy using a stamp file. Is that not still true in the current code? If not, what mechanism is preventing someone from creating the stamp file using Ignition and having the service run? ConditionKernelCommandLine=!ignition.firstboot means it won't run on the Ignition boot, but what about a second boot?

@jlebon
Copy link
Member Author

jlebon commented Dec 7, 2021

Carrying forward the converstaion from this thread in #1324 (comment) you said This is introducing an API that allows new installs to default to iptables-legacy using a stamp file. Is that not still true in the current code? If not, what mechanism is preventing someone from creating the stamp file using Ignition and having the service run? ConditionKernelCommandLine=!ignition.firstboot means it won't run on the Ignition boot, but what about a second boot?

Yeah, I've thought about this and tried to find a succinct way to block it without introducing yet another stamp file, but couldn't think of one. I don't think it's a real concern though, do you? The way it's set up, it's clearly not ergonomic to use that way and we do provide the Butane snippet to use. There's always a chance someone can abuse or disable the safeguards you've put in place so it didn't seem productive to worry about it.

@dustymabe
Copy link
Member

dustymabe commented Dec 7, 2021

Yeah, I've thought about this and tried to find a succinct way to block it without introducing yet another stamp file, but couldn't think of one.

What about using the aleph-version as another knob for the script?

$ jq -r .build /sysroot/.coreos-aleph-version.json
35.20211119.1.0

So we could allow up to a certain version.

Somewhat related: we should probably have a library or something for comparing versions of FCOS.

@jlebon
Copy link
Member Author

jlebon commented Dec 7, 2021

Yeah, I've thought about this and tried to find a succinct way to block it without introducing yet another stamp file, but couldn't think of one.

What about using the aleph-version as another knob for the script?

$ jq -r .build /sysroot/.coreos-aleph-version.json
35.20211119.1.0

So we could allow up to a certain version.

Hmm yeah, maybe...It'd have to be a version after or equal to the one where we switch to nft. I guess we edit this PR at the last minute when it's ready to be merged, and use the last release version numbers? Not sure. I think it'd be harder to screw up if we just add a stamp file if we really care about this (which I'm still not sure we should).

@travier
Copy link
Member

travier commented Dec 8, 2021

What about using the aleph-version as another knob for the script?

$ jq -r .build /sysroot/.coreos-aleph-version.json
35.20211119.1.0

So we could allow up to a certain version.

Somewhat related: we should probably have a library or something for comparing versions of FCOS.

This is very similar to the discussion in coreos/fedora-coreos-tracker#892 (comment).

@jlebon
Copy link
Member Author

jlebon commented Dec 8, 2021

Discussed with @dustymabe about this and realized to be nice we should have the Butane config we send in the email work both before and after the migration. I've updated the PR now to defend against users writing the stamp file via Ignition as an API to switch to legacy post-migration.

That said, nothing prevents them from just manually creating the stamp file after provisioning and rebooting. We'd probably need a separate stamp file to fix that, but again, I'm not convinced it's worth it. I'm increasingly thinking we should make a barrier just for this, so then we can just rip it out after one release.

Also updated email draft.

@dustymabe
Copy link
Member

I'm increasingly thinking we should make a barrier just for this, so then we can just rip it out after one release.

+1 for a barrier.

dustymabe
dustymabe previously approved these changes Dec 8, 2021
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

From `overlay.d/README.md`:

Contains systemd service and script for remaining on iptables-nft after
the migration to nft.

Split out because (1) it will roll out to next first, and (2) it can
more easily be deleted after the barrier release.

For more details, see:
coreos/fedora-coreos-tracker#676
coreos#1324
This is prep for enabling iptables-nft in `next`.

Because tests are shared between streams, this is a bit awkward. The way
this does it is:
- Make the iptables-legacy test exclusive and attach a Butane config
  that sets the legacy symlinks. On next, this will verify that this
  config can be used to boot into legacy. On !next, this will verify
  that the config can safely be used even before migration.
- Add an iptables non-exclusive test. On next, this will verify that the
  default backend is nft. On !next, it will verify that it is legacy.
  Once the migration is over on all streams, the latter check will be
  removed, so it'll purely check for nft.
@jlebon jlebon marked this pull request as ready for review February 1, 2022 19:13
@jlebon
Copy link
Member Author

jlebon commented Feb 1, 2022

Updated and tested!

@jlebon
Copy link
Member Author

jlebon commented Feb 1, 2022

Let's write up a checklist in coreos/fedora-coreos-tracker#676 to detail how we're going to roll this out.

dustymabe
dustymabe previously approved these changes Feb 1, 2022
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Feb 1, 2022
Ship with iptables-nft by default. This requires a postprocessing script
until we can fully drop iptables-legacy from the base.

For more information, see:
coreos/fedora-coreos-tracker#676
coreos#1324
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Feb 1, 2022
This overlay contains a script allowing existing nodes to opt-out from
the iptables-nft migration.

We should remove it after a barrier release.

The overlay was added in
coreos#1324.
jlebon added a commit to jlebon/fedora-coreos-config that referenced this pull request Feb 1, 2022
This overlay contains a script allowing existing nodes to opt-out from
the iptables-nft migration.

We should remove it after a barrier release.

The overlay was added in
coreos#1324.
These are manual upgrade tests that verify various upgrade paths for
iptables-nft.

It's manual in that you have to update the `OCIARCHIVE_URL` to point to
a URL of an ociarchive of a build with `35coreos-iptables`.

Long-term, I'd like to add external tests support directly in upgrade
tests so that we could have access to the artifacts vis e.g.
`KOLA_EXT_DATA` or a mount. But for now, this will do.

To run the tests, first update `OCIARCHIVE_URL`, and then:

```
    kola run -E /path/to/tests/manual/iptables-nft-migration ext.iptables-nft-migration.*
```
@jlebon jlebon changed the title Default to iptables-nft and support migrating back to legacy overlay.d: add 35coreos-iptables Feb 16, 2022
@jlebon jlebon enabled auto-merge (rebase) February 16, 2022 21:18
@jlebon jlebon removed the hold label Feb 16, 2022
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jlebon jlebon merged commit e000ea0 into coreos:testing-devel Feb 16, 2022
jlebon added a commit that referenced this pull request Feb 16, 2022
From `overlay.d/README.md`:

Contains systemd service and script for remaining on iptables-nft after
the migration to nft.

Split out because (1) it will roll out to next first, and (2) it can
more easily be deleted after the barrier release.

For more details, see:
coreos/fedora-coreos-tracker#676
#1324
dustymabe pushed a commit to jlebon/fedora-coreos-config that referenced this pull request Feb 21, 2022
Ship with iptables-nft by default. This requires a postprocessing script
until we can fully drop iptables-legacy from the base.

For more information, see:
coreos/fedora-coreos-tracker#676
coreos#1324
dustymabe pushed a commit to jlebon/fedora-coreos-config that referenced this pull request Feb 21, 2022
This overlay contains a script allowing existing nodes to opt-out from
the iptables-nft migration.

We should remove it after a barrier release.

The overlay was added in
coreos#1324.
dustymabe pushed a commit that referenced this pull request Feb 21, 2022
Ship with iptables-nft by default. This requires a postprocessing script
until we can fully drop iptables-legacy from the base.

For more information, see:
coreos/fedora-coreos-tracker#676
#1324
dustymabe pushed a commit that referenced this pull request Feb 21, 2022
This overlay contains a script allowing existing nodes to opt-out from
the iptables-nft migration.

We should remove it after a barrier release.

The overlay was added in
#1324.
elemental-lf added a commit to elemental-lf/typhoon that referenced this pull request Sep 2, 2022
kube-proxy and Calico are still using the legacy implementation, so we want our custom rules
to go into the same tables.

See:
  * https://docs.fedoraproject.org/en-US/fedora-coreos/major-changes/#_moving_to_iptables_nft
  * coreos/fedora-coreos-tracker#676
  * coreos/fedora-coreos-config#1324
@jlebon jlebon deleted the pr/iptables-nft branch April 23, 2023 23:29
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
From `overlay.d/README.md`:

Contains systemd service and script for remaining on iptables-nft after
the migration to nft.

Split out because (1) it will roll out to next first, and (2) it can
more easily be deleted after the barrier release.

For more details, see:
coreos/fedora-coreos-tracker#676
coreos#1324
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this pull request Oct 10, 2023
From `overlay.d/README.md`:

Contains systemd service and script for remaining on iptables-nft after
the migration to nft.

Split out because (1) it will roll out to next first, and (2) it can
more easily be deleted after the barrier release.

For more details, see:
coreos/fedora-coreos-tracker#676
coreos#1324
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants