Skip to content

DevGuide CellStates

Oliver Kennedy edited this page Jun 9, 2021 · 16 revisions

In the typical steady-state, a Workflow converges to an ordered sequence of Cell objects in the DONE state. The Result object associated with each cell describes the effects (messages and state updates) of running the computation described by the associated Module on the state produced by running each of the preceding cells in sequential order.

Because code execution is not instantaneous (or always successful), cells reach this point thorough the following state diagram:

Clone or Thaw Cell                           Freeze Cell
         \                                        |
          v                                       V
    --- WAITING --------------------> DONE     FROZEN
   /       |                           ^
  /    ,---+            ,-> CANCELLED  |
  |   /    v           /          ^    |
  |  |  BLOCKED ---+--+--> ERROR  |    |
  |   \..,  |     /          ^    |    /
   \     V  v   ,/           |    |   /
    `--> STALE --> RUNNING --+----+--`
           ^
          /
  New Cell (or Clone cell in the ERROR state)

Cell execution and this state transition table is managed by the Scheduler. Cell states, and their Result are described as follows.

We make a distinction between pending states (WAITING, BLOCKED, STALE, RUNNING) and final states (DONE, CANCELLED, ERROR, FROZEN)

  • WAITING: The state of the cell's Result is unknown. It has one, but a preceding cell is still running and we don't have enough information to guarantee that it won't invalidate this cell's Result. Accordingly resultId may or may not be valid. If this cell transitions to the DONE state without going through the BLOCKED or STALE states, resultId will remain unchanged. The cell needs to be re-executed if provenance data in resultId is stale or if resultId is None.

  • STALE: The cell does not have a valid Result object, either because it is freshly inserted, or because an prior dependency of the cell was updated. This cell will be re-evaluated by the Scheduler when the opportunity presents itself. When the cell is in this state resultId must be ignored, and the cell always needs to be re-executed.

  • RUNNING: The cell has an incomplete Result object. When the cell is in this state resultId should be ignored, except for notification purposes (e.g., displaying cell outputs as they arrive).

  • ERROR: The cell was in the RUNNING state at one point, but failed to evaluate completely (e.g., due to a typo in a Python cell). resultId contains a valid Result that describes the error. There is no provenance data in resultId and the cell always needs to be re-executed.

  • CANCELLED: The cell was in a pending state when the workflow was aborted OR the cell was in a WAITING, BLOCKED, or STALE state when a preceding cell entered into the ERROR state. resultId, if Some(_), contains the Result of the last successful execution of the cell. This cell needs to be re-executed if resultId is None or the associated provenance data is stale.

  • DONE: The computation described by the cell's Module completed successfully. resultId references the result of the execution. A cell in this state does not need to be re-executed.

  • FROZEN: This cell has been temporarily removed from the workflow by user-request. Execution ignores this cell and resultId references the Result from the most recent execution. Cells in this state are never re-executed.

In summary:

State Result Reusable? Provenance [4] Pending? Default Clone
WAITING Tentative Maybe [1] Valid Yes WAITING
STALE Useless No Valid Yes STALE
RUNNING Incomplete No Invalid Yes STALE
ERROR Complete No Invalid No STALE
CANCELLED Tentative Maybe [1] Valid No WAITING
DONE Complete Yes Valid No DONE
FROZEN Complete Maybe [3] Valid No FROZEN

Notes:

  1. Cells in the WAITING or STALE states may have reusable results, but this is a non-local decision. See Scheduler.updateCellState
  2. Cells in the RUNNING state have invalid results until cell execution completes (whether successfully or not). The result may be reusable once the cell enters the done state.
  3. Although FROZEN cells are never executed, it may be possible to re-use the cell state when the cell is thawed.
  4. A cell in the RUNNING or ERROR state has a result object, but the input and output artifacts encoded in this result object do not completely describe the cell's provenance. When transitioning/cloning a cell from this state into a state with a valid provenance, the result object MUST be cleared (with the exception of the RUNNING → DONE transition).

The above table describes default cell behavior when the workflow is modified. Each modification deviates from this behavior as follows:

  • append: The new cell enters in the STALE state.
  • insert: The new cell enters in the STALE state. All DONE cells after the insertion point move to the WAITING state.
  • delete: All DONE cells after the insertion point enter the WAITING state.
  • update: The updated cell moves to the STALE state. All DONE cells after the insertion point move to the WAITING state.
  • freezeOne: The frozen cell moves to the FROZEN state. All DONE cells after the modification point move to the WAITING state. If the frozen cell is in an invalid provenance state, the result is cleared.
  • thawOne: The thawed cell moves to the WAITING state. All DONE cells after the modification point move to the WAITING state.
  • freezeFrom: All cells at and after the modification point move to the FROZEN state. Any cells being frozen in invalid provenance states have their results cleared.
  • thawFrom: All cells at and after the modification point move to the WAITING state.

Additionally, the following describe in-situ changes to the workflow.

  • complete: The affected cell moves to the DONE state. All subsequent WAITING cells with a dependency on any outputs of the affected cell move to the STALE state. All other WAITING cells prior to the first STALE cell move to the DONE state. The first STALE cell moves to the RUNNING state.
  • abort: All cells in the WAITING, BLOCKED, STALE, and RUNNING states move to the CANCELLED state. All cells originally in an invalid provenance state have their results cleared.
  • error: The affected cell moves to the ERROR state. All subsequent cells in the WAITING, BLOCKED, STALE, and RUNNING states move to the CANCELLED state. All cells originally in an invalid provenance state have their results cleared.