-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to reconcile allocated Pod IPs. #3113
base: master
Are you sure you want to change the base?
Conversation
This is done periodically. That's how the datastore keeps track of the ips used, available and does a ENI call if the additional ENI is needed. |
I guess I meant reconcile the allocated IPs against running pods. It doesn't check in steady state that a pod is still around when its IP is allocated. The assumption is that if the pod was deleted the IPAMD would have been called by the CNI plugin. But I think there are some cases where this assumption breaks down, like if the CNI->ipamd grpc call fails for any reason. There are probably other edge cases but I couldn't fully reason out what could cause this to happen. |
The CNI today only reconciles its datastore with existing pods at startup but never again. Sometimes its possible that IPAMD goes out of sync with the kubelet's view of the pods running on the node if it fails or is temporarily unreachable by the CNI plugin handling the DelNetwork call from the contrainer runtime. In such cases the CNI continues to consider the pods IP allocated and will not free it as it will never see a DelNetwork again. This results in CNI failing to assign IP's to new pods. This change adds a reconcile loop which periodically (once a minute) reconciles its allocated IPs with existence of pod's veth devices. If the veth device is not found then it free's up the corresponding allocation making the IP available for reuse. Fixes aws#3109
cc5834c
to
9d4cc1d
Compare
ping for a review! |
@@ -77,6 +77,9 @@ func _main() int { | |||
// Pool manager | |||
go ipamContext.StartNodeIPPoolManager() | |||
|
|||
// Pod IP allocation reconcile loop to clear up dangling pod IPs. | |||
go ipamContext.PodIPAllocationReconcileLoop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an initial review, I am hesitant to add this additional 1 minute delay for the IP Sync in Reconcile Loop. I am not sure on the need for this. I will try understand the problem you encountered and see if there is any other approach to resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean by 1 minute delay. This is not adding to the reconcile loop that reconciles pod IP allocation against IPs assigned to the node but instead is just cleaning up pod IPs that may not be in-use anymore because the corresponding pod is gone. This is a separate goroutine that runs a loop that just does that and should not add any delay to the regular IP reconcile loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, the need for this separate routine itself is introducing some concerns here. We will see why this is strictly needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any update on this?
What type of PR is this?
improvement
Which issue does this PR fix?:
Fixes #3109
What does this PR do / Why do we need it?:
The CNI today only reconciles its datastore with existing pods at startup but never again. Sometimes its possible that IPAMD goes out of sync with the kubelet's view of the pods running on the node if it fails or is temporarily unreachable by the CNI plugin handling the DelNetwork call from the contrainer runtime.
In such cases the CNI continues to consider the pods IP allocated and will not free it as it will never see a DelNetwork again. This results in CNI failing to assign IP's to new pods.
This change adds a reconcile loop which periodically (once a minute) reconciles its allocated IPs with existence of pod's veth devices. If the veth device is not found then it free's up the corresponding allocation making the IP available for reuse.
Fixes #3109
Testing done on this change:
Added unit tests.
Will this PR introduce any new dependencies?:
No
Will this break upgrades or downgrades? Has updating a running cluster been tested?:
No
Does this change require updates to the CNI daemonset config files to work?:
No
Does this PR introduce any user-facing change?:
No
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.