-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlay.d: add 35coreos-iptables #1324
Conversation
Need to add tests for this. But also, we can't merge this until we actually officially start the deprecation window and wait the 3 months. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @jlebon. A few comments/questions:
overlay.d/05core/usr/lib/dracut/modules.d/35coreos-network/coreos-enable-iptables-legacy.sh
Outdated
Show resolved
Hide resolved
overlay.d/05core/usr/lib/dracut/modules.d/35coreos-network/coreos-enable-iptables-legacy.sh
Outdated
Show resolved
Hide resolved
overlay.d/05core/usr/lib/dracut/modules.d/35coreos-network/coreos-enable-iptables-legacy.sh
Outdated
Show resolved
Hide resolved
LEGACY_TARGETS="ip6tables-legacy ip6tables-legacy-restore ip6tables-legacy-save iptables-legacy iptables-legacy-restore iptables-legacy-save" | ||
|
||
# sanity-check the stamp file is present | ||
if [ ! -e /sysroot/etc/fedora-coreos/iptables-legacy.stamp ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.. We're starting to use the /etc/fedora-coreos
folder for things now. I really think we should create a docs page for everything we've ever put in this folder (i..e will be useful in the future) that explains or links to reasonings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me! Let's try to find someone to tackle this bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in 05core
though. Should it be in 15fcos
instead? If not, maybe we should be using /etc/coreos
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This migration specifically, RHCOS has already done it. Though I like the idea of keeping the door open for things that do also affect RHCOS. So leaning towards using /etc/coreos
, and putting the specific things that leverage the directory in either 05core
or 15fcos
depending on whether it needs to apply to RHCOS too (so this one for example, would go in 15fcos
). Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
overlay.d/05core/usr/lib/dracut/modules.d/35coreos-network/coreos-enable-iptables-legacy.sh
Outdated
Show resolved
Hide resolved
overlay.d/05core/usr/lib/dracut/modules.d/35coreos-network/coreos-enable-iptables-legacy.sh
Outdated
Show resolved
Hide resolved
Description=CoreOS Enable iptables-legacy | ||
ConditionPathExists=/etc/initrd-release | ||
DefaultDependencies=false | ||
ConditionPathExists=/sysroot/etc/fedora-coreos/iptables-legacy.stamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we limit this unit to running only once? It only needs to run once, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good question. I think it would be less confusing if it's "bound" to the stamp file that already exists instead of creating another one. One case I can imagine is someone upgrading to an iptables-nft-default release, then deploying back an iptables-legacy-default release and then upgrading again. If we ran once only, we wouldn't reapply the legacy backend, but we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of torn on this one. I like the use of a systemd unit like this for "migration" or "run once" type stuff but not as much for an every boot kind of thing. For example, what if someone decides in the future they want to switch to using NFT and they log into their systems and fix the symlinks themselves (not realizing /sysroot/etc/fedora-coreos/iptables-legacy.stamp
) exists. After a reboot they're back to legacy iptables.
This is a contrived example, but what I'm trying to argue for is not having 2nd order configuration (i.e. configuration that dynamically controls other configuration) that runs every boot. It starts to get confusing real quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that I don't want to make this conditional on another stamp file. So maybe it's simplest if we just have the script delete it once it's done. Then it's naturally one shot. It also solves my other concern of the stamp file living forever and possibly not being in sync with what's in /etc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I actually kind of like having a second stamp file :). Mainly because there is one that was the original request (requesting for the service to do something) and there is a second one to let us know the service ran (in case the journal had been wrapped and we no longer have that information in journalctl
). I agree it's not great the have the stamp files hang around forever, but then again it could be nice (given our new use of /etc/fedora-coreos/
) to add a datestamp (or some other time indicator) to the stamp files (i.e. /etc/fedora-coreos/202203-iptables-legacy.stamp
) and we can just let them collect, almost like a little log of system history. Given that we periodically have migrations like this, it could be useful to see them all in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this and I think we might be tackling this slightly wrong. This is introducing an API that allows new installs to default to iptables-legacy using a stamp file. Do we want to commit to that? For how long? I think instead it should be a documented Butane snippet setting the symlinks (or if we really want, some Butane sugar for it).
So then this here would be concerned strictly about upgrading nodes. In which case we can delete this script after the next barrier.
Re. keeping the stamp file in /etc
as a "log", I'm still not convinced. :) But we could also add a stamp file in /var/lib
as a marker that we ran if we really want something more permanent (but note also that ostree admin config-diff
is the final source of truth for what local changes have been made anyway).
@bgilbert Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on not mutating configuration on every boot. Also agree that this is currently introducing an API with unbounded lifetime, sort of an "anti-feature flag". Given the lack of consensus on the feature flags proposal in coreos/fedora-coreos-tracker#892, I think it'd be better to avoid that. The same reasoning mitigates against Butane sugar. A documented Butane snippet for new installs seems reasonable to me; it's consistent with other migrations we've done in the past.
As to retaining the upgrade stamp forever, I'm also not convinced. The symlinks themselves are the source of truth for which mode we're in, and how we got there seems less relevant. When old journal entries age out, by definition that's an indication that we no longer care about the corresponding slice of historical record, so I don't think we need to go out of our way to stash the info somewhere else.
as part of this PR let's also update the |
bccb410
to
278b605
Compare
Updated this, but haven't sanity-checked manually myself. Let's see what CI says. |
a8c1770
to
3f45e3e
Compare
Rebased and tweaked slightly based on discussions. Still need to address feedback from #1324 (comment). |
f47ce46
to
116c83c
Compare
Now with #1324 (comment) addressed! I've also updated the email draft to reflect the latest approach. Let's try to send it this week? |
overlay.d/15fcos/usr/lib/dracut/modules.d/35coreos-iptables/coreos-enable-iptables-legacy.sh
Outdated
Show resolved
Hide resolved
local symlink=$1; shift | ||
# check that the deployment is still using the symlink (i.e. the user didn't | ||
# do something funky), and that the OSTree default is still symlink-based | ||
# (i.e. that we didn't change strategy and forgot to update this script) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# (i.e. that we didn't change strategy and forgot to update this script)
Should we hard error in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not easy to tell those two cases apart though (this is related to the other comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But also in general, I think for this we should try to be really good at detecting the state we expect, but otherwise just leave everything untouched without alarming the user. These updating machines could be ancient and carry a lot of state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think I was mostly just assuming if this hard fails we'll catch it in CI and resolve the problem. I agree, though, that if we wanted that it should be part of a CI test and not part of these units.
overlay.d/15fcos/usr/lib/dracut/modules.d/35coreos-iptables/coreos-enable-iptables-legacy.sh
Outdated
Show resolved
Hide resolved
overlay.d/15fcos/usr/lib/dracut/modules.d/35coreos-iptables/coreos-enable-iptables-legacy.sh
Outdated
Show resolved
Hide resolved
echo "Executable $path no longer present; exiting." | ||
exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this should be a hard error. If this is true we failed IMO. Maybe we should add an appropriate check in our kola tests to make sure this is never true... Or I guess iptables --version | grep legacy
(in the kola test already) is sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to be resilient to e.g. some layered package or some customization having changed the tree (or e.g. with coreos/enhancements#7, that'd be trivially possible though this script probably won't live that long).
Or I guess
iptables --version | grep legacy
(in the kola test already) is sufficient?
👍 Agreed.
Carrying forward the converstaion from this thread in #1324 (comment) you said This is introducing an API that allows new installs to default to iptables-legacy using a stamp file. Is that not still true in the current code? If not, what mechanism is preventing someone from creating the stamp file using Ignition and having the service run? |
Yeah, I've thought about this and tried to find a succinct way to block it without introducing yet another stamp file, but couldn't think of one. I don't think it's a real concern though, do you? The way it's set up, it's clearly not ergonomic to use that way and we do provide the Butane snippet to use. There's always a chance someone can abuse or disable the safeguards you've put in place so it didn't seem productive to worry about it. |
116c83c
to
0beed61
Compare
What about using the aleph-version as another knob for the script?
So we could allow up to a certain version. Somewhat related: we should probably have a library or something for comparing versions of FCOS. |
Hmm yeah, maybe...It'd have to be a version after or equal to the one where we switch to nft. I guess we edit this PR at the last minute when it's ready to be merged, and use the last release version numbers? Not sure. I think it'd be harder to screw up if we just add a stamp file if we really care about this (which I'm still not sure we should). |
This is very similar to the discussion in coreos/fedora-coreos-tracker#892 (comment). |
0beed61
to
81782b1
Compare
Discussed with @dustymabe about this and realized to be nice we should have the Butane config we send in the email work both before and after the migration. I've updated the PR now to defend against users writing the stamp file via Ignition as an API to switch to legacy post-migration. That said, nothing prevents them from just manually creating the stamp file after provisioning and rebooting. We'd probably need a separate stamp file to fix that, but again, I'm not convinced it's worth it. I'm increasingly thinking we should make a barrier just for this, so then we can just rip it out after one release. Also updated email draft. |
+1 for a barrier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
81782b1
to
1e501ef
Compare
From `overlay.d/README.md`: Contains systemd service and script for remaining on iptables-nft after the migration to nft. Split out because (1) it will roll out to next first, and (2) it can more easily be deleted after the barrier release. For more details, see: coreos/fedora-coreos-tracker#676 coreos#1324
This is prep for enabling iptables-nft in `next`. Because tests are shared between streams, this is a bit awkward. The way this does it is: - Make the iptables-legacy test exclusive and attach a Butane config that sets the legacy symlinks. On next, this will verify that this config can be used to boot into legacy. On !next, this will verify that the config can safely be used even before migration. - Add an iptables non-exclusive test. On next, this will verify that the default backend is nft. On !next, it will verify that it is legacy. Once the migration is over on all streams, the latter check will be removed, so it'll purely check for nft.
7f6224c
to
63e2f81
Compare
Updated and tested! |
Let's write up a checklist in coreos/fedora-coreos-tracker#676 to detail how we're going to roll this out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Ship with iptables-nft by default. This requires a postprocessing script until we can fully drop iptables-legacy from the base. For more information, see: coreos/fedora-coreos-tracker#676 coreos#1324
This overlay contains a script allowing existing nodes to opt-out from the iptables-nft migration. We should remove it after a barrier release. The overlay was added in coreos#1324.
This overlay contains a script allowing existing nodes to opt-out from the iptables-nft migration. We should remove it after a barrier release. The overlay was added in coreos#1324.
These are manual upgrade tests that verify various upgrade paths for iptables-nft. It's manual in that you have to update the `OCIARCHIVE_URL` to point to a URL of an ociarchive of a build with `35coreos-iptables`. Long-term, I'd like to add external tests support directly in upgrade tests so that we could have access to the artifacts vis e.g. `KOLA_EXT_DATA` or a mount. But for now, this will do. To run the tests, first update `OCIARCHIVE_URL`, and then: ``` kola run -E /path/to/tests/manual/iptables-nft-migration ext.iptables-nft-migration.* ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
From `overlay.d/README.md`: Contains systemd service and script for remaining on iptables-nft after the migration to nft. Split out because (1) it will roll out to next first, and (2) it can more easily be deleted after the barrier release. For more details, see: coreos/fedora-coreos-tracker#676 #1324
Ship with iptables-nft by default. This requires a postprocessing script until we can fully drop iptables-legacy from the base. For more information, see: coreos/fedora-coreos-tracker#676 coreos#1324
This overlay contains a script allowing existing nodes to opt-out from the iptables-nft migration. We should remove it after a barrier release. The overlay was added in coreos#1324.
Ship with iptables-nft by default. This requires a postprocessing script until we can fully drop iptables-legacy from the base. For more information, see: coreos/fedora-coreos-tracker#676 #1324
This overlay contains a script allowing existing nodes to opt-out from the iptables-nft migration. We should remove it after a barrier release. The overlay was added in #1324.
kube-proxy and Calico are still using the legacy implementation, so we want our custom rules to go into the same tables. See: * https://docs.fedoraproject.org/en-US/fedora-coreos/major-changes/#_moving_to_iptables_nft * coreos/fedora-coreos-tracker#676 * coreos/fedora-coreos-config#1324
From `overlay.d/README.md`: Contains systemd service and script for remaining on iptables-nft after the migration to nft. Split out because (1) it will roll out to next first, and (2) it can more easily be deleted after the barrier release. For more details, see: coreos/fedora-coreos-tracker#676 coreos#1324
From `overlay.d/README.md`: Contains systemd service and script for remaining on iptables-nft after the migration to nft. Split out because (1) it will roll out to next first, and (2) it can more easily be deleted after the barrier release. For more details, see: coreos/fedora-coreos-tracker#676 coreos#1324
From
overlay.d/README.md
:Contains systemd service and script for remaining on iptables-nft after
the migration to nft.
Split out because (1) it will roll out to next first, and (2) it can
more easily be deleted after the barrier release.
For more details, see:
coreos/fedora-coreos-tracker#676
#1324