Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: e2e: fix checkpoint flake #24300

Merged

Conversation

edsantiago
Copy link
Member

Two flakes seen in the last three months. One of them was in
August, so it's not related to ongoing criu-4.0 problems.

Suspected cause: race waiting for "podman run --rm" container
to transition from stopped to removed.

Solution: allow a 5-second grace period, retrying every second.

Also: add explanations to the Expect()s, remove unnecessary
code, and tighten up the CID check.

x x x x x x
int(2) remote(1) fedora-40(1) root(2) host(1) boltdb(2)
podman(1) fedora-39(1) container(1)

Signed-off-by: Ed Santiago [email protected]

None

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 16, 2024
Copy link

Cockpit tests failed for commit 81b2f2f69e975c89ac8b7164db9ee92cb0c922e4. @martinpitt, @jelly, @mvollmer please check.

@martinpitt
Copy link
Contributor

The cockpit F41 failure is the same as the one discussed in #24238 (comment) ff. Reported to https://bugzilla.redhat.com/show_bug.cgi?id=2319310 and workaround in cockpit-project/cockpit-podman#1883 .

You can retry or ignore here.

test/e2e/checkpoint_test.go Outdated Show resolved Hide resolved
Comment on lines 746 to 753
Expect(podmanTest.NumberOfContainersRunning()).To(Equal(1), "# of running containers after restore")
Expect(podmanTest.NumberOfContainers()).To(Equal(1), "total # of containers after restore")
Expect(podmanTest.GetContainerStatus()).To(ContainSubstring("Up"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are rather inefficient as each call to podmanTest... does a podman ps internally so all of these checks could really be combined into single podmn ps, to speed the test up. I know it is very little gain so not worth fixing now but at least for new tests I would call this a anti pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to remove these as you added the one podman ps check above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh. Yes, of course I did. Thank you for catching that.

@edsantiago edsantiago force-pushed the flake-fix-checkpoint-test branch from 81b2f2f to 3aabb08 Compare October 17, 2024 11:51
Two flakes seen in the last three months. One of them was in
August, so it's not related to ongoing criu-4.0 problems.

Suspected cause: race waiting for "podman run --rm" container
to transition from stopped to removed.

Solution: allow a 5-second grace period, retrying every second.

Also: add explanations to the Expect()s, remove unnecessary
code, and tighten up the CID check.

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago edsantiago force-pushed the flake-fix-checkpoint-test branch from 3aabb08 to fa920f5 Compare October 17, 2024 12:40
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Oct 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 290d94d into containers:main Oct 18, 2024
54 checks passed
@edsantiago edsantiago deleted the flake-fix-checkpoint-test branch October 21, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants