Thick plugin graceful termination #1338

dougbtv · 2024-09-19T18:36:08Z

This PR introduces graceful shutdown functionality to the Multus daemon by adding a /readyz endpoint alongside the existing /healthz. The /readyz endpoint starts returning 500 once a SIGTERM is received, indicating the daemon is in shutdown mode. During this time, CNI requests can still be processed for a short window. The daemonset configs have been updated to increase terminationGracePeriodSeconds from 10 to 30 seconds, ensuring we have a bit more time for these clean shutdowns.

This addresses a race condition during pod transitions where the readiness check might return true, but a subsequent CNI request could fail if the daemon shuts down too quickly. By introducing the /readyz endpoint and delaying the shutdown, we can handle ongoing CNI requests more gracefully, reducing the risk of disruptions during critical transitions.

Major thanks to @deads2k for the find, identification, fix, and of course, the explanations. Appreciate it.

coveralls · 2024-09-19T19:10:06Z

coverage: 63.822% (-0.04%) from 63.857%
when pulling 531dec1 on dougbtv:thickplugin_graceful_term2
into f1e887e on k8snetworkplumbingwg:master.

@deads2k

…on by adding a /readyz endpoint That is added alongside the existing /healthz. The /readyz endpoint starts returning 500 once a SIGTERM is received, indicating the daemon is in shutdown mode. During this time, CNI requests can still be processed for a short window. The daemonset configs have been updated to increase terminationGracePeriodSeconds from 10 to 30 seconds, ensuring we have a bit more time for these clean shutdowns. This addresses a race condition during pod transitions where the readiness check might return true, but a subsequent CNI request could fail if the daemon shuts down too quickly. By introducing the /readyz endpoint and delaying the shutdown, we can handle ongoing CNI requests more gracefully, reducing the risk of disruptions during critical transitions. Major thanks to @deads2k for the find, identification, fix, and of course, the explanations. Appreciate it.

dougbtv force-pushed the thickplugin_graceful_term2 branch 2 times, most recently from c187bed to dcbd737 Compare September 19, 2024 19:04

dougbtv mentioned this pull request Sep 19, 2024

graceful termination explanation openshift/multus-cni#251

Closed

dougbtv force-pushed the thickplugin_graceful_term2 branch from dcbd737 to 531dec1 Compare September 19, 2024 19:41

dougbtv mentioned this pull request Sep 19, 2024

OCPBUGS-42236: (CARRY) Graceful shutdown functionality for multus daemon openshift/multus-cni#252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thick plugin graceful termination #1338

Thick plugin graceful termination #1338

dougbtv commented Sep 19, 2024

coveralls commented Sep 19, 2024 •

edited

Loading

Thick plugin graceful termination #1338

Are you sure you want to change the base?

Thick plugin graceful termination #1338

Conversation

dougbtv commented Sep 19, 2024

coveralls commented Sep 19, 2024 • edited Loading

coveralls commented Sep 19, 2024 •

edited

Loading