Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: parallel-safe network system test #23694

Merged
merged 1 commit into from
Aug 30, 2024

Conversation

edsantiago
Copy link
Member

  • replace random_string with safename in container/network names
  • add ci:parallel tags where possible.
    • where not possible, add explanations

Signed-off-by: Ed Santiago [email protected]

None

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 21, 2024
@@ -295,6 +302,7 @@ load helpers.network
run_podman 1 network rm $mynetname
}

# CANNOT BE PARALLELIZED due to iptables/nft commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also podman network reload --all, it would cause flakes in other test doing networking connectivity checks

@@ -957,14 +979,15 @@ function wait_for_restart_count() {
}

# Test for https://github.com/containers/podman/issues/18615
# CANNOT BE PARALLELIZED due to strict checking of /run/netns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how much value this check here does. I mean I want to check that we do no leak the netns files however we only do so here in a ugly way IMO. Podman uses a consistent netns file name, as such I wonder if we should just add such check in the leak_check at the end of the suite. Then it will cover all tests not just this one. There should never be reason for us to leak the file, however it would be hard to know where it was leakded in case of failures but I guess if leaks happen we can instrument teardown to ls /run/netns to notice where it starts leaking...

Of course out of scope of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well.... this comment does have your name on it

# This is racy if other programs modify /run/netns while the test is running.
# However I think the risk is minimal and I think checking for this is important.

I'm all for adding a leak check, but what would that look like? Right now on my laptop I see six /run/netns/netns* files with no tests running. Looks like I have some orphan pasta processes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was added by me, my comment was more of if it is blocking parallel tests it may not be a good place because checking leaks in one place doesn't help much if we still leak in other places

I'm all for adding a leak check, but what would that look like? Right now on my laptop I see six /run/netns/netns* files with no tests running. Looks like I have some orphan pasta processes

pasta should not be used as root so I don't think they relate to each other. If there are leaks right now then we need to debug/fix them first of course. Anyhow as said not in this PR. I will file a new issue to consider this.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Aug 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ip netns add $netns
run_podman create --network=ns:/var/run/netns/$netns $IMAGE
cid=${output}
run_podman inspect --format '{{ .NetworkSettings.Networks }}' $cid
is "$output" 'map[]' "NeworkSettings should be empty"
run_podman rm $cid
echo "$_LOG_PROMPT ip netns delete $netns"
ip netns delete $netns
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a leak. Found by a leak check I've added (experimental) to #23275.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking but doesn't this mean this always leaks the netns when one of the checks above fails?
Should this be moved into a custom teardown function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but what's different about here vs all the other places that leak netns on failure? Like this one (scroll to bottom): every time 600 shell completion fails, it leaks four netns, the UUID-named ones, even though the network create in that test uses safename. It has been bothering me for a while, but I've only spent minimal time investigating.

Anyhow, if netns cleanup is more important than other container-related cleanup, I'm happy to move that into teardown somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the netns files created by podman are leaked then this is a podman bug, even in the error paths we should be removing them

I don't think it is super important as it would only leak on error cases and there is no problems with leaks other than a bit of kernel memory keeping it around. And a reboot fixes that so I think that is fine for machines where these tests are run.

- replace random_string with safename in container/network names
- add ci:parallel tags where possible.
  - where not possible, add explanations
- fix a userns leak

Signed-off-by: Ed Santiago <[email protected]>
@Luap99
Copy link
Member

Luap99 commented Aug 30, 2024

/lgtm

Given this is has been up for one week without another review I am going to merge it without a second review

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 30, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit ece759a into containers:main Aug 30, 2024
61 checks passed
@edsantiago edsantiago deleted the safename-500 branch August 30, 2024 13:23
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 29, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Nov 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants