-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronize() can miss pod sandboxes that are in the process of being created, leading to missing PodSandbox events #63
Comments
/cc @klihub |
We'll want to trace through and understand other situations that can cause pods to be in StateUnknown. Without looking at the code right now, my understanding is that StateUnknown is also used during containerd restarts. We might want to publish some best practices for writing NRI plugins (and for general event-based systems), something like:
NRI plugins themselves can crash (or might need to be updated while containerd is running), so they'll need to be able to bootstrap and maintain correct state throughout their lifecycles. |
I need to check if it is possible to differentiate between a transient unknown state (pod being created) and non-transient ones. If it is possible then in principle we can try to be more/correctly selective about filtering pods in unknown state during plugin synchronization. However, I think that wouldn't be enough to fully solve this problem. Even if the plugin synchronization would relay pods in such a state, the pod information relayed would be incomplete. Since there are no post-* events for pods this would not get corrected from the plugins point of view until the next (pod or container) event involving the same pod occurs. So then we'd also need to account for this internally and take some corrective measures at the end of pod creation once it exits the transient state/gets created. I suspect that an easier/simpler alternative could be to block plugin registration while a pod is being created. |
Just my $0.02: I think it is good not to filter out anything. If we have knowledge of Pod, but it is in inconsistent state, send it to plugin during Sync, but clearly state status Delaying registering of the plugins might be not the best scenario: pod creation might be stuck for significant amount of time, thus might start fail liveness/readyness probe for NRI plugin (if deployed via DaemonSet). |
We have a plugin that monitors for
RunPodSandbox
events. We observed that if aRunPodSandbox
requests is in flight while the NRI plugin starts up and registers, then the pod sandbox event will be missed and not delivered inSynchronize
orRunPodSandbox
.Here's the timeline:
RunPodSandbox
creation eventRunPodSandbox
, and creates a pod sandbox in sandboxstore.StateUnknownRunPodSandbox
NRI event (because no NRI plugin is registered just yet)sandboxstore.StateUnknown
RunPodSandbox
completesRunPodSandbox
event was missed from bothSynchronize
call andRunPodSandbox
NRI events!Expected behavior:
I would expect that for every pod sandbox event, it will be delivered in either
Synchronize
orRunPodSandbox
. Maybe one approach to consider is forSynchronize
to return pod sandboxes creations that are in flight (i.e. don't exclude Unknown state pod sandboxes).The text was updated successfully, but these errors were encountered: