Skip to content

DevGuide CellStates

Oliver Kennedy edited this page Jun 11, 2021 · 16 revisions

In the typical steady-state, a Workflow converges to an ordered sequence of Cell objects in the DONE state. The Result object associated with each cell describes the effects (messages and state updates) of running the computation described by the associated Module on the state produced by running each of the preceding cells in sequential order.

State Diagram

Because code execution is not instantaneous (or always successful), cells reach this point thorough the following state diagram:

Clone or Thaw Cell With 
Valid Input Provenance
  \
   `-------> WAITING ------.
                |           \
W/ Invalid      V            |
     Input -> STALE ---------+
 Provenace      |            |
                V            V
             RUNNING --> CANCELLED
                |    \
                |     \
                V      V 
               DONE   ERROR

We make a distinction between pending states (WAITING, BLOCKED, STALE, RUNNING) and final states (DONE, CANCELLED, ERROR, FROZEN)

State Definitions

Cell execution and this state transition table is managed by the Scheduler. Cell states, and their Result are described as follows:

  • WAITING: The input scope to this cell is not sufficiently defined to determine whether this cell is STALE or DONE. The cell may or may not have a Result object. If it does, the provenance is valid. If the input provenance is unchanged once the scope has been resolved, the cell can immediately transition to the DONE state and reuse the result object. If not, the cell transitions to the STALE state. A cell in the WAITING state has valid input provenance, but incomplete output provenance.

  • STALE: The input scope to this cell is sufficiently defined to assert that the cell's input provenance must use artifact versions that are different from the active scope. This cell's Result object is stale and must be replaced. This cell will be re-evaluated by the Scheduler when the opportunity presents itself. A cell in the STALE state does not have valid output provenance. A cell in the WAITING state has valid input provenance, but incomplete output provenance.

  • RUNNING: A previously STALE cell who's module is currently being executed by the Scheduler. The cell may have a Result object, but this object is incomplete until execution completes and the cell transitions into the DONE state. The Result object should be ignored for all internal use, although its contents may be displayed to the user (e.g., displaying messages as they arrive). A cell in the RUNNING state has incomplete input and output provenance.

  • ERROR: A previously RUNNING cell for which execution failed (e.g., due to a syntax error in a python cell). The Result object contains message data that describes the error, but all provenance information in the result object should be ignored. A cell in the ERROR state has incomplete input and output provenance.

  • CANCELLED: The cell was in a pending state when the workflow was aborted OR the cell was in a WAITING, BLOCKED, or STALE state when a preceding cell entered into the ERROR state. resultId, if Some(_), contains the Result of the last successful execution of the cell. This cell needs to be re-executed if resultId is None or the associated provenance data is stale. A cell in the CANCELLED state has valid input provenance, but incomplete output provenance.

  • DONE: The computation described by the cell's Module completed successfully. resultId references the result of the execution. A cell in this state does not need to be re-executed. A cell in the DONE state has valid input and output provenance.

  • FROZEN: This cell has been temporarily removed from the workflow by user-request. Execution ignores this cell and resultId references the Result from the most recent execution. Cells in this state are never re-executed. A cell in the FROZEN state has valid input and output provenance; Note that the output provenance should be ignored while the cell remains in the FROZEN state.

In summary:

State Reusable? In Provenance [4] Out Provenance Pending? Default Clone
WAITING Maybe [1] Valid Invalid Yes WAITING
STALE No Valid Invalid Yes STALE
RUNNING No Invalid Invalid Yes STALE
ERROR No Invalid Invalid No STALE
CANCELLED Maybe [1] Valid Invalid No WAITING
DONE Yes Valid Valid No DONE
FROZEN Maybe [3] Valid Ignore (empty) No FROZEN

Notes:

  1. Cells in the WAITING or STALE states may have reusable results, but this is a non-local decision. See Scheduler.updateCellState
  2. Cells in the RUNNING state have invalid results until cell execution completes (whether successfully or not). The result may be reusable once the cell enters the done state.
  3. Although FROZEN cells are never executed, it may be possible to re-use the cell state when the cell is thawed.
  4. A cell in the RUNNING or ERROR state has a result object, but the input and output artifacts encoded in this result object do not completely describe the cell's provenance. When transitioning/cloning a cell from this state into a state with a valid input provenance (e.g., ERROR → FROZEN), the result object MUST be cleared (with the exception of the RUNNING → DONE transition).

State Transitions

The above table describes default cell behavior when the workflow is modified. Each modification deviates from this behavior as follows:

  • append: The new cell enters in the STALE state.
  • insert: The new cell enters in the STALE state. All DONE cells after the insertion point move to the WAITING state.
  • delete: All DONE cells after the insertion point enter the WAITING state.
  • update: The updated cell moves to the STALE state. All DONE cells after the insertion point move to the WAITING state.
  • freezeOne: The frozen cell moves to the FROZEN state. All DONE cells after the modification point move to the WAITING state. If the frozen cell is in an invalid provenance state, the result is cleared.
  • thawOne: The thawed cell moves to the WAITING state. All DONE cells after the modification point move to the WAITING state.
  • freezeFrom: All cells at and after the modification point move to the FROZEN state. Any cells being frozen in invalid provenance states have their results cleared.
  • thawFrom: All cells at and after the modification point move to the WAITING state.

Additionally, the following describe in-situ changes to the workflow.

  • complete: The affected cell moves to the DONE state. All subsequent WAITING cells with a dependency on any outputs of the affected cell move to the STALE state. All other WAITING cells prior to the first STALE cell move to the DONE state. The first STALE cell moves to the RUNNING state.
  • abort: All cells in the WAITING, BLOCKED, STALE, and RUNNING states move to the CANCELLED state. All cells originally in an invalid provenance state have their results cleared.
  • error: The affected cell moves to the ERROR state. All subsequent cells in the WAITING, BLOCKED, STALE, and RUNNING states move to the CANCELLED state. All cells originally in an invalid provenance state have their results cleared.

The Cell Update Algorithm

This algorithm is implemented in Provenance.updateCellStates. The basic flow is:

  1. Initialize empty scope and start at the first cell.
  2. Update cell state
  3. Update scope
  4. Go to next cell and repeat from 2 until no more cells left.
Update Cell State

The main purpose of this step is to attempt to resolve cells in the WAITING state into either the DONE or STALE state.

A WAITING cell can be resolved into the DONE state if its input provenance is consistent with the scope. Specifically: For each artifact reference in the input provenance, we compare the reference's artifactId to the artifactId of the scope entry with the same userFacingName. The input provenance is consistent with the scope if all userFacingNames in the input provenance are present in the scope, and corresponding artifactIds are identical.

If the WAITING cell does not have a result object (i.e., has not yet been executed), or its input provenance is not consistent with the scope, it is resolved into the STALE state.

This step is also used to do a little cleanup on cells in the CANCELLED and ERROR states. These should never actually show up by the time the algorithm is called (as cloning the workflow removes these states). In the interest of defensive programming, the default transitions are enforced here as well. If the cell is in the ERROR state is is moved into the STALE state. Likewise, cells in the CANCELLED state are treated as being in the WAITING state.

Update Scope

The scope is a mapping from the userFacingNames of artifacts to the artifactId of the artifact that the name encodes. This step maintains the scope by incorporating the outcome of a cell's execution (the output provenance) into the scope. Scope updates are computed based on the cell's state after being updated in the prior step, as follows:

  • WAITING: The prior step will never result in a cell in the WAITING state.
  • STALE: The output provenance of this cell is not valid, so we are unable to derive status information for subsequent cells; The algorithm ends.
  • RUNNING: The output provenance of this cell is not valid, so we are unable to derive status information for subsequent cells; The algorithm ends.
  • ERROR: The output provenance of this cell is not valid, so we are unable to derive status information for subsequent cells; The algorithm ends.
  • DONE: userFacingName/artifactId pairs in the output provenance are merged into the scope.
  • FROZEN: This cell is treated as if it were not present in the workflow. The scope is not modified.