address issue with max concurrent and work fetch #5755
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #5749 (hopefully)
Max concurrent is a limit on jobs, not processor instances.
The work fetch logic made the erroneous implicit assumption
that all jobs use 1 CPU.
So e.g. if project has max concurrent 4,
and the client has two 2-CPU jobs,
it will think (if work buf is zero) that there's
no point in fetching more work.
But in fact the project could use 8 CPUs, so 4 are idle.
Fix: if a project has MC constraints,
then for each resource compute 'mc_max_could_use':
the max # of instances the project could use, given its MC constraints.
Use this to compute the project's shortfall,
and hence to decide whether to fetch work from it.
Note: the way mc_max_could_use is computed is crude;
it takes the max over all apps,
when it's possible that only one of them has a MC constraint.
This could result in limited over-fetching,
but that's preferable to under-fetching and starvation.
Sim: show app name in timeline