You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUBE polls pfcon for the status of plugin instances. However, in certain failure modes, pfcon will not be able to be polled. For example:
pfcon is unreachable (temporary failure)
the job for the plugin instance was deleted from Kubernetes (permanent failure)
CUBE continues to retry these polls. The problem is that sometimes the failures are permanent failures, and CUBE will be polling the failure indefinitely.
CUBE polls plugin instances in a first-in first-out manner (FIFO). This means if you have too many plugin instances stuck in an errored state, newer plugin instances which are working aren't going to ever be polled.
Suggested solutions
CUBE should discriminate between "temporary" failures and "permanent" failures. When a "permanent" failure is encountered, the plugin instance status should be set as "cancelled" (or some other indication of system error)
CUBE should use a priority queue instead of a FIFO queue. Repeated polls which give "temporary" failure should be de-prioritized.
Document how to configure the polling queue size.
The text was updated successfully, but these errors were encountered:
CUBE polls pfcon for the status of plugin instances. However, in certain failure modes, pfcon will not be able to be polled. For example:
CUBE continues to retry these polls. The problem is that sometimes the failures are permanent failures, and CUBE will be polling the failure indefinitely.
CUBE polls plugin instances in a first-in first-out manner (FIFO). This means if you have too many plugin instances stuck in an errored state, newer plugin instances which are working aren't going to ever be polled.
Suggested solutions
The text was updated successfully, but these errors were encountered: