Overlay execution info messages in timeline view #3429

hamersaw · 2023-03-09T01:24:43Z

hamersaw
Mar 9, 2023
Maintainer

Motivation

The timeline UI view is marginally useful to debug performance, but has a lot of room for improvement. Integrating the runtime metrics breakdown proposed in the performance observability RFC is a step in the right direction, partitioning node executions into a collection of categorized time-series. This representation will help the "what" but misses a lot of the "why". For example, if a particular execution has a large amount of frontend plugin overhead this means that Flyte started the Task but the backend service has not yet indicated the service has started. K8s gurus will be quick to identify that there may be scheduling contention, large image pull times, or a few other likely scenarios. However, this is not easily available to the user even though FlytePropeller has this information available. We currently store a singular "reason" for the current execution status' but may be better off tracking a time-series of reasons to better explain the execution.

Proposal

This proposal outlines a solution for overlaying a collection of human readable messages in the timeline view. The exact representation is VERY open for debate, but I envision something similar to jaeger (time-series telemetry data with events) which uses a single tick mark that displays a message on hover. This solution supplies the "why" in an explanation of the reported execution status that will complement the "what" in the runtime breakdown of the execution time-series. The goal will be to balance utility with simplicity, displaying a "useful" number of messages to improve context.

Implementation

Currently, FlyteAdmin maintains a singular "reason" within the task execution metadata. This is updated in-place on each event from FlytePropeller, meaning the old "reasons" are not persisted. At risk of over-simplifying this, we will need to transition to maintaining a collection of "reasons" with associated timestamps. This will require updates in the following repositories:

FlyteIDL: update TaskExecutionClosure to have repeated reasons with associated timestamps.
FlyteAdmin: use an append to the "reason" list rather than overwriting the existing singular "reason".
FlyteConsole: correctly parse the "reason" list to annotate the timeline UI view.

Open Questions

How should this be visualized? I will leave this discussion for more UI / UX oriented personnel.
Should we add this information to node executions / workflow executions? Currently the "reason" is only tracked for the task-level execution.
Do we need to be able to send multiple reasons in a single task event?
currently possible to skip phases if execution progresses before FlytePropeller detects and processes the intermediate stage
could use event buffers to just send multiple events -> probably the better solution

hamersaw · 2023-03-09T01:26:58Z

hamersaw
Mar 9, 2023
Maintainer Author

cc @pradithya regarding this issue - I think this solution, in combination with the runtime metrics integration would achieve your vision. Thoughts?

1 reply

pradithya Mar 9, 2023
Collaborator

At risk of over-simplifying this, we will need to transition to maintaining a collection of "reasons" with associated timestamps

Is it what TaskExecutionEvent for? I might also over-simplify this, but can Flyte store these events in a table (I don't think it does today, CMIIW)?
these information can then be associated with the node/workflow/task execution when populating the UI.

How should this be visualized? I will leave this discussion for more UI / UX oriented personnel.

Since you bring up jaeger, I think that's the best way I can think of to visualize this. The tricky part is to figure out the "span" of each phase, especially if skipping phase is possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overlay execution info messages in timeline view #3429

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Overlay execution info messages in timeline view #3429

hamersaw Mar 9, 2023 Maintainer

Motivation

Proposal

Implementation

Open Questions

Replies: 1 comment · 1 reply

hamersaw Mar 9, 2023 Maintainer Author

pradithya Mar 9, 2023 Collaborator

hamersaw
Mar 9, 2023
Maintainer

Replies: 1 comment 1 reply

hamersaw
Mar 9, 2023
Maintainer Author

pradithya Mar 9, 2023
Collaborator