Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New container tests fail intermittently on Aarch64 #835

Closed
troglobit opened this issue Nov 26, 2024 · 0 comments · Fixed by #843
Closed

New container tests fail intermittently on Aarch64 #835

troglobit opened this issue Nov 26, 2024 · 0 comments · Fixed by #843
Assignees
Labels
bug Something isn't working

Comments

@troglobit
Copy link
Contributor

The new use-case tests ospf-container and container-firewall-basic fail intermittently on Aarch64 test systems. The failures result in "inconsistent container state" errors from Podman and manifest themselves as failure to create/start at least one of the containers.

Further investigation show that the speed at which configuration is done at, using any of the M2M interfaces, unearths flaws in the asynchronous way that container creation/deletion is handled in the system. When a user adds/deletes a container the actual job is enqueued to execd, which may lead to new configurations being applied before execd has completed a previous one.

This requires a redesign/simplification to ensure the M2M interfaces wait for, at least, the deletion and cleanup of containers before applying a new configuration.

@troglobit troglobit converted this from a draft issue Nov 26, 2024
@troglobit troglobit added this to the Infix v24.11.1 milestone Nov 26, 2024
@troglobit troglobit added the bug Something isn't working label Nov 26, 2024
@troglobit troglobit self-assigned this Nov 26, 2024
troglobit added a commit that referenced this issue Nov 28, 2024
This is a complete redesign of how the system creates/deletes containers
from the running-config.  Containers are now removed synchronously from
confd before any interfaces they may be using are removed, and created
in parallel, using a Finit task, well after confd has finished setting
up interfaces.

The logic previously provided by execd to retry container create on any
route/address changes, or periodically every 60 seconds, is now handled
by a new 'setup' command in the container wrapper script.

Additionally, container create is now split in wget/curl/podman pull of
the image and 'podman create'.  This to both consolidate image fetching
and improve user feedback since most of the retry logic (above) revolves
around the image download.

Fixes #835

Signed-off-by: Joachim Wiberg <[email protected]>
troglobit added a commit that referenced this issue Nov 28, 2024
This is a complete redesign of how the system creates/deletes containers
from the running-config.  Containers are now removed synchronously from
confd before any interfaces they may be using are removed, and created
in parallel, using a Finit task, well after confd has finished setting
up interfaces.

The logic previously provided by execd to retry container create on any
route/address changes, or periodically every 60 seconds, is now handled
by a new 'setup' command in the container wrapper script.

Additionally, container create is now split in wget/curl/podman pull of
the image and 'podman create'.  This to both consolidate image fetching
and improve user feedback since most of the retry logic (above) revolves
around the image download.

Fixes #835

Signed-off-by: Joachim Wiberg <[email protected]>
@troglobit troglobit linked a pull request Nov 28, 2024 that will close this issue
17 tasks
@github-project-automation github-project-automation bot moved this from In progress to Done in Infix & C:o Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant