Mains celery worker and RabbitMQ FIFO behavior is not fault-tolerant when plugin instance status checks are not working #558

jennydaman · 2024-06-05T20:09:09Z

CUBE polls pfcon for the status of plugin instances. However, in certain failure modes, pfcon will not be able to be polled. For example:

pfcon is unreachable (temporary failure)
the job for the plugin instance was deleted from Kubernetes (permanent failure)

CUBE continues to retry these polls. The problem is that sometimes the failures are permanent failures, and CUBE will be polling the failure indefinitely.

CUBE polls plugin instances in a first-in first-out manner (FIFO). This means if you have too many plugin instances stuck in an errored state, newer plugin instances which are working aren't going to ever be polled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mains celery worker and RabbitMQ FIFO behavior is not fault-tolerant when plugin instance status checks are not working #558

Mains celery worker and RabbitMQ FIFO behavior is not fault-tolerant when plugin instance status checks are not working #558

jennydaman commented Jun 5, 2024 •

edited by Sandip117

Loading

Mains celery worker and RabbitMQ FIFO behavior is not fault-tolerant when plugin instance status checks are not working #558

Mains celery worker and RabbitMQ FIFO behavior is not fault-tolerant when plugin instance status checks are not working #558

Comments

jennydaman commented Jun 5, 2024 • edited by Sandip117 Loading

Suggested solutions

jennydaman commented Jun 5, 2024 •

edited by Sandip117

Loading