manifest: Add conntrack (tools) but without the daemon #502

cgwalters · 2021-02-19T16:44:43Z

coreos/fedora-coreos-tracker#404
https://bugzilla.redhat.com/show_bug.cgi?id=1925698
openshift/machine-config-operator#2421

This will help us work around a believed kernel bug for OpenShift right now. We may remove this later.

coreos/fedora-coreos-tracker#404 https://bugzilla.redhat.com/show_bug.cgi?id=1925698 openshift/machine-config-operator#2421 This will help us work around a believed kernel bug for OpenShift right now. We may remove this later.

travier · 2021-02-19T16:52:04Z

Will need a release-4.7 backport

LorbusChris · 2021-02-19T16:52:39Z

I wonder whether this should be done in upstream FCOS for now. Things like this would otherwise have to be managed in https://github.com/openshift/okd-machine-os/, broadening the gap between FCOS and the OKD machine OS

cgwalters · 2021-02-19T16:55:34Z

I agree this is creating an OKD gap; we could address that by having okd-machine-os just install the conntrack-tools package alongside the rest of the stuff like kubelet etc.

I am less certain about blocking this quick-fix-for-OCP on adding this to FCOS - that's basically a permanent commitment, although sentiment seemed to be in favor-ish.

jlebon · 2021-02-19T17:11:12Z

As a quick hack, this seems fine to me!

/approve

But long-term we should either get that package split out, or e.g. moved to the MCD as was discussed. Given that, I'd rather not do this hack at all in FCOS if we can, because remove-from-packages is really not great (and honestly, we should be looking at dropping the ones we currently have).

jlebon · 2021-02-19T17:14:47Z

Things like this would otherwise have to be managed in openshift/okd-machine-os, broadening the gap between FCOS and the OKD machine OS

Because FCOS is more general than RHCOS, I think some gap will basically always exist and we should embrace that and figure out how to manage it best (e.g. like the extensions strengthening work),

lucab · 2021-02-19T17:31:06Z

This is going to change cri-o behavior, which performs some networking-related logic based on auto-detection of binary presence.
Ideally, before landing this, there should be a tri-state knob (on | off | autodetect) in cri-o configuration so that the expected behavior can be pinned.

cgwalters · 2021-02-19T17:33:10Z

This is going to change cri-o behavior,

Right but...is that code always something we want anyways? I am not sure.

One option to make this even more obviously a hack for current OCP is to move the binary to e.g. /usr/opt/openshift-private/bin/conntrack, and that would also defeat crio finding it too - if indeed that is a problem.

mrunalp · 2021-02-19T21:00:31Z

@aojea ptal from the crio hostport manager perspective.

aojea · 2021-02-19T21:20:30Z

This is going to change cri-o behavior, which performs some networking-related logic based on auto-detection of binary presence.
Ideally, before landing this, there should be a tri-state knob (on | off | autodetect) in cri-o configuration so that the expected behavior can be pinned.

cri-o manage the hostport in the containers, without the conntrack binary crio has a bug in certain UDP scenarios and it also doesn't pass one e2e test kubernetes/kubernetes#91216.

I don't know the reasons why this is this way honestly, but the correct behaviour for crio, since is the owner of the hostport logic, is to use the conntrack binary to delete the stale entries ... the conntrack logic doesn't modify any behaviour, it fixes a bug

/lgtm

mrunalp · 2021-02-19T21:37:48Z

@aojea Thanks! Now the question is if the crio code is enough for the gcp issue or we still need the changes proposed in openshift/machine-config-operator#2421 cc: @michaelgugino

aojea · 2021-02-19T21:40:52Z

@aojea Thanks! Now the question is if the crio code is enough for the gcp issue or we still need the changes proposed in openshift/machine-config-operator#2421 cc: @michaelgugino

that issue is orthogonal to crio, that is due to tcp connection, this is about stale udp connections.

cgwalters · 2021-02-19T21:41:22Z

/retest

openshift-bot · 2021-02-19T22:11:05Z

/retest