-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2837] Log & Send Events, Improve logging #957
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #957 +/- ##
=======================================
Coverage 80.84% 80.85%
=======================================
Files 97 97
Lines 12512 12517 +5
=======================================
+ Hits 10115 10120 +5
Misses 2126 2126
Partials 271 271 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on this entire approach (see comments on JIRA for a detailed explanation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also -1, we need to be careful about what we send & how often.
pkg/scheduler/objects/allocation.go
Outdated
func (a *Allocation) setPreemptionPreConditionsCheckPassed() { | ||
a.Lock() | ||
defer a.Unlock() | ||
if a.preemptionPreConditionsCheckFailed { | ||
a.preemptionPreConditionsCheckFailed = false | ||
a.askEvents.SendPreemptionPreConditionsCheckPassed(a.allocationKey, a.applicationID, a.allocatedResource) | ||
} | ||
} | ||
|
||
func (a *Allocation) setPreemptionPreConditionsCheckFailed() { | ||
a.Lock() | ||
defer a.Unlock() | ||
if !a.preemptionPreConditionsCheckFailed { | ||
a.preemptionPreConditionsCheckFailed = true | ||
a.askEvents.SendPreemptionPreConditionsCheckFailed(a.allocationKey, a.applicationID, a.allocatedResource) | ||
} | ||
} | ||
|
||
func (a *Allocation) setPreemptionQueueGuaranteesCheckPassed() { | ||
a.Lock() | ||
defer a.Unlock() | ||
if a.preemptionQueueGuaranteesCheckFailed { | ||
a.preemptionQueueGuaranteesCheckFailed = false | ||
a.askEvents.SendPreemptionQueueGuaranteesCheckPassed(a.allocationKey, a.applicationID, a.allocatedResource) | ||
} | ||
} | ||
|
||
func (a *Allocation) setPreemptionQueueGuaranteesCheckFailed() { | ||
a.Lock() | ||
defer a.Unlock() | ||
if !a.preemptionQueueGuaranteesCheckFailed { | ||
a.preemptionQueueGuaranteesCheckFailed = true | ||
a.askEvents.SendPreemptionQueueGuaranteesCheckFailed(a.allocationKey, a.applicationID, a.allocatedResource) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all REQUEST
events which means they will be sent to the respective pods.
I think the only message which might be useful is an indication that a particular pod successfully triggered preemption. Or the other way around: we send a message to the terminated pod that it's been preempted to make room for pod X.
Everything else (guarantees passed/failed, precondition passed/failed) is just too much noise which should not be sent to K8s as a pod event.
The headroom/quota events are different because they're related to YK configuration. Hitting the quota is a useful feedback for the user, because they'll know why a particular pod cannot be scheduled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should not send events except in these cases:
Request pod: "Preempting pod {namespace}/{pod} to make room" (there could be multiple of these if we preempt multiple pods)
Victim pods: "Preempting pod to make room for {namespace}/{pod}" (only one of these per victim)
This allows users to see that their pod triggered preemption, or that their pod was preempted and by whom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, there is only one log line "Reserving node for ask after preemption" being logged at the end of the preemption process. Other than this, we don't have any info.
We need below info to debug the preemption related issues:
- How do we know whether preemption has been even attempted or not?
- If attempted, Is there any problem in passing through important steps?
- If attempted and successful case, how do we know this case? Answer: Above log line helps
I went through all these questions while trying to find the reason for test failures logged in https://issues.apache.org/jira/browse/YUNIKORN-2808.
I think the only message which might be useful is an indication that a particular pod successfully triggered preemption.
Yes. It definitely helps. We need something like this. I had also similar thought, but doing this way would always sends an event. Hence, I followed the other usual model of 1. Sending only one event and that too only when failed. Does not repeat. 2. Sending only one event when succeeded (Only If 1st event happens). Does not repeat.
Request pod: "Preempting pod {namespace}/{pod} to make room" (there could be multiple of these if we preempt multiple pods)
Victim pods: "Preempting pod to make room for {namespace}/{pod}" (only one of these per victim)
Yes, it helps but only in successful case and that too at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must not log for every attempt as this happens far too frequently. It will overwhelm the system. Currently in the scheduler we only log when action is taken or the state of the system changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pod events definitely should only be sent once, and only when action is taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an acceptable approach is recording stuff like the allocation log for each request. That looks reasonable to me. We can have a type like AllocationLog
and we define fixed causes about why the preemption failed:
- "Preconditions checks failed" (
preemptor.CheckPreconditions()
false) - "No guarantee to free up resources" (
p.checkPreemptionQueueGuarantees()
false) - "Preemption shortfall" (
p.ask.GetAllocatedResource().StrictlyGreaterThanOnlyExisting()
false)
Every time something fails, you increase a counter for it. You also count the number of total attempts.
So I'm thinking sth like:
// Mutable fields which need protection
allocated bool
allocLog map[string]*AllocationLogEntry
preemptionLog map[string]*AllocationLogEntry // new field
preemptionAttempts int64 // new field
preemptionTriggered bool
preemptCheckTime time.Time
...
var (
ErrPreemptionPrecondFailed = errors.New("Preconditions checks failed")
ErrPreemptionNoGuarantee = errors.New("No guarantee to free up resources")
ErrPreemptionShortfall = errors.New("Preemption shortfall")
)
// OR perhaps better w/ typed errors
type PreemptionError int
const ErrPreemptionPrecondFailed PreemptionError = iota
const ErrPreemptionNoGuarantee
const ErrPreemptionShortfall
Then in application.go
:
if fullIterator != nil {
request.IncPreemptionAttempts()
if result, err := sa.tryPreemption(headRoom, preemptionDelay, request, fullIterator, false); err != nil {
// preemption occurred, and possibly reservation
return result
}
request.LogPreemptionFailure(err)
}
}
Obviously you need to make sure that problems propagate back so you can log it. Then this whole thing can be exposed on the REST API just like the allocation log.
It's obviously not as detailed as a separate log entry printed to the stdout, but I see this as a good compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just reuse AllocationLog
? It already contains events pertaining to things like predicate failures; preemption attempts can be tracked there easily as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have used allocationLog earlier too in few places in addition with events & logging. Now using only allocationLog, removed events and logs. Still seeing a gap to add debug logs in few places especially after guarantee checks passes. Anyways, we can address this later depends on the need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1. Re-requesting a review when nothing has changed isn't helpful.
248573e
to
f996239
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just minor things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I found only a nit
586ecc2
to
fde38f3
Compare
fde38f3
to
1365158
Compare
What is this PR for?
Send events in appropriate places, enhanced logging etc to help understand the preemption flow while debugging preemption related issues.
What type of PR is it?
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-2837
How should this be tested?
Screenshots (if appropriate)
Questions: