Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
HPCC-30534 Prevent spurious workunit failed states
The thor agent (managing the instance queue), was spuriously setting workunits to failed, if the Thor instance died, which included when the Thor instance span down when idle (linger). The thoragent manages the instances that jobs target, that in a default configuration (multiLingerJob) will spin up when a wuid needs an instance, and spin down when idle. If whilst and instance was being recycles it threw a k8s exception (e.g. 'Job has reached the specified backoff limit'), it would spuriously cause the original workunit that span up the instance to be marked failed. The workunit should only be updated at this point, if it is still marked as having a running state. Normally the workunit workflow instance should manage the final state. Signed-off-by: Jake Smith <[email protected]>
- Loading branch information