Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mantle/kola: Add function to enhance upgrade stability #3938

Merged
merged 1 commit into from
Dec 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions mantle/kola/tests/upgrade/basic.go
Original file line number Diff line number Diff line change
Expand Up @@ -313,10 +313,33 @@ func runFnAndWaitForRebootIntoVersion(c cluster.TestCluster, m platform.Machine,
}
}

func waitForUpgradeToBeStaged(c cluster.TestCluster, m platform.Machine) {
// Here we set up a systemd path unit to watch for when ostree
// behind the scenes updates the refs in the repo under the
// /ostree/deploy directory.
// Using /ostree/deploy as the canonical API for monitoring deployment changes.
// This path is updated by ostree for deployment changes.
// refchanged.path will trigger when it gets updated and will then stop wait.service.
// The systemd-run --wait causes it to not return here (and thus
// continue execution of code here) until wait.service has been
// stopped by refchanged.service. This is an effort to make us
// start waiting inside runFnAndWaitForRebootIntoVersion until
// later in the upgrade process because we are seeing failures due
// to timeouts and we're trying to reduce the variability by
// minimizing the wait inside that function to just the actual reboot.
// https://github.com/coreos/fedora-coreos-tracker/issues/1805
//
// Note: if systemd-run ever gains the ability to --wait when
// generating a path unit then the below can be simplified.
c.RunCmdSync(m, "sudo systemd-run -u refchanged --path-property=PathChanged=/ostree/deploy systemctl stop wait.service")
c.RunCmdSync(m, "sudo systemd-run --wait -u wait sleep infinity")
}

func waitForUpgradeToVersion(c cluster.TestCluster, m platform.Machine, version string) {
runFnAndWaitForRebootIntoVersion(c, m, version, func() {
// Start Zincati so it will apply the update
c.RunCmdSync(m, "sudo systemctl start zincati.service")
waitForUpgradeToBeStaged(c, m)
})
}

Expand All @@ -328,6 +351,7 @@ func rpmostreeRebase(c cluster.TestCluster, m platform.Machine, ref, version str
// we use systemd-run here so that we can test the --reboot path
// without having SSH not exit cleanly, which would cause an error
c.RunCmdSyncf(m, "sudo systemd-run rpm-ostree rebase --reboot %s", ref)
waitForUpgradeToBeStaged(c, m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this makes sense here. In this case we're running rpm-ostree rebase synchronously. It'll have already done the deployment (and initiated a reboot) by the time you get to this line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly.. systemd-run is sending rpm-ostree off on it's own little boat to finish independently IIUC.

})
}

Expand Down
Loading