-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifest: Add conntrack (tools) but without the daemon #502
manifest: Add conntrack (tools) but without the daemon #502
Conversation
coreos/fedora-coreos-tracker#404 https://bugzilla.redhat.com/show_bug.cgi?id=1925698 openshift/machine-config-operator#2421 This will help us work around a believed kernel bug for OpenShift right now. We may remove this later.
Will need a |
I wonder whether this should be done in upstream FCOS for now. Things like this would otherwise have to be managed in https://github.com/openshift/okd-machine-os/, broadening the gap between FCOS and the OKD machine OS |
I agree this is creating an OKD gap; we could address that by having okd-machine-os just install the conntrack-tools package alongside the rest of the stuff like kubelet etc. I am less certain about blocking this quick-fix-for-OCP on adding this to FCOS - that's basically a permanent commitment, although sentiment seemed to be in favor-ish. |
As a quick hack, this seems fine to me! /approve But long-term we should either get that package split out, or e.g. moved to the MCD as was discussed. Given that, I'd rather not do this hack at all in FCOS if we can, because |
Because FCOS is more general than RHCOS, I think some gap will basically always exist and we should embrace that and figure out how to manage it best (e.g. like the extensions strengthening work), |
This is going to change cri-o behavior, which performs some networking-related logic based on auto-detection of binary presence. |
Right but...is that code always something we want anyways? I am not sure. One option to make this even more obviously a hack for current OCP is to move the binary to e.g. |
@aojea ptal from the crio hostport manager perspective. |
cri-o manage the hostport in the containers, without the conntrack binary crio has a bug in certain UDP scenarios and it also doesn't pass one e2e test kubernetes/kubernetes#91216. I don't know the reasons why this is this way honestly, but the correct behaviour for crio, since is the owner of the hostport logic, is to use the conntrack binary to delete the stale entries ... the conntrack logic doesn't modify any behaviour, it fixes a bug /lgtm |
@aojea Thanks! Now the question is if the crio code is enough for the gcp issue or we still need the changes proposed in openshift/machine-config-operator#2421 cc: @michaelgugino |
that issue is orthogonal to crio, that is due to tcp connection, this is about stale udp connections. |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
9 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, ashcrow, cgwalters, jlebon, travier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Ohh I see the problem:
Edit: pinged DPTP to scale up nodes with kvm. |
We debugged this to openshift/release@7a961fa which steered jobs for this repo back to build01, which means we lost nested virt. DPTP is investigating a fix. |
openshift/release#16193 merged, let's see |
OK now we ran on build02, but are missing |
/retest |
Now retrying after openshift/release#16197 |
/retest |
I added some more debugging |
/retest |
4 similar comments
/retest |
/retest |
/retest |
/retest |
@cgwalters: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
So nice to have working CI again! Thank you so much @cgwalters! /hold cancel |
coreos/fedora-coreos-tracker#404
https://bugzilla.redhat.com/show_bug.cgi?id=1925698
openshift/machine-config-operator#2421
This will help us work around a believed kernel bug for OpenShift right now. We may remove this later.