Docs on how to write a new executor #498

TomNicholas · 2024-07-14T22:16:15Z

There are a lot of parallel frameworks that Cubed Plans could be converted to (e.g. #499). We have executors for dask & beam, but instead of trying to write more executors ourselves it would be nice to just more clearly document how to create a new executor so that others can contribute.

tomwhite · 2024-07-18T10:26:20Z

That's a great suggestion. I'll sketch out something here to start with.

A Cubed executor is a subclass of DagExecutor:

cubed/cubed/runtime/types.py

Lines 9 to 15 in e7ff365

    
           class DagExecutor: 
        
               @property 
        
               def name(self) -> str: 
        
                   raise NotImplementedError  # pragma: no cover 
        
               def execute_dag(self, dag: MultiDiGraph, **kwargs) -> None: 
        
                   raise NotImplementedError  # pragma: no cover

The execute_dag method is responsible for taking a Cubed plan - which is a DAG of operations and arrays - and turning it into sets of parallel tasks that it runs in stages. It is also responsible for issuing callbacks whenever an operation starts or a task finishes, which the client uses to update progress bars, and record statistics.

The simplest example of a Cubed executor is the SingleThreadedExecutor which runs tasks sequentially:

cubed/cubed/runtime/executors/local.py

Lines 29 to 58 in e7ff365

    
           class SingleThreadedExecutor(DagExecutor): 
        
               """The default execution engine that runs tasks sequentially uses Python loops.""" 
        
               @property 
        
               def name(self) -> str: 
        
                   return "single-threaded" 
        
               def execute_dag( 
        
                   self, 
        
                   dag: MultiDiGraph, 
        
                   callbacks: Optional[Sequence[Callback]] = None, 
        
                   resume: Optional[bool] = None, 
        
                   spec: Optional[Spec] = None, 
        
                   compute_id: Optional[str] = None, 
        
                   **kwargs, 
        
               ) -> None: 
        
                   for name, node in visit_nodes(dag, resume=resume): 
        
                       handle_operation_start_callbacks(callbacks, name) 
        
                       pipeline: CubedPipeline = node["pipeline"] 
        
                       for m in pipeline.mappable: 
        
                           exec_stage_func( 
        
                               m, 
        
                               pipeline.function, 
        
                               config=pipeline.config, 
        
                               name=name, 
        
                               compute_id=compute_id, 
        
                           ) 
        
                           if callbacks is not None: 
        
                               event = TaskEndEvent(name=name) 
        
                               [callback.on_task_end(event) for callback in callbacks]

In practice backends will have the following characteristics that make them suitable targets for Cubed:

Parallel. The ability to efficiently run multiple tasks in parallel. (The task inputs are generated by the the pipeline.mappable iterator.)
Code distribution. The ability to run arbitrary code in the remote process via some distribution mechanism. (The task function is pipeline.function.)
Memory guarantees. The backend should guarantee that the task gets a certain (configurable) amount of memory.
Retries. The backend should have some way of retrying a task if it fails.
Timeouts. Tasks should fail after a certain amount of time, so they can be retried.
(Optional) Straggler mitigation. Very slow tasks are detected and retried with a backup task so as to not slow down the entire computation. Most backends do not have support for this, but it can be implemented as a part of the executor.

These features (and a few more) are discussed in #276, which also has a table showing which executor has each feature.

Some of the executors in Cubed use asyncio APIs to call the backend (e.g. Modal, Dask, local threads and processes), which makes implementing some of these features easier since the code can be shared (e.g. backup tasks). However, backends do not have to offer an asyncio API to be integrated with Cubed (e.g. Lithops).

All the executor implementations can be found in cubed.runtime.executors.

For testing, new executors can be conditionally added to the ALL_EXECUTORS list - if the backend dependency is present.

TomNicholas added the documentation Improvements or additions to documentation label Jul 14, 2024

This was referenced Jul 18, 2024

Executor for Apache Spark #499

Open

Ray Executor #488

Open

HPC executor #467

Open

tomwhite added the runtime label Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs on how to write a new executor #498

Docs on how to write a new executor #498

TomNicholas commented Jul 14, 2024 •

edited

Loading

tomwhite commented Jul 18, 2024

Docs on how to write a new executor #498

Docs on how to write a new executor #498

Comments

TomNicholas commented Jul 14, 2024 • edited Loading

tomwhite commented Jul 18, 2024

TomNicholas commented Jul 14, 2024 •

edited

Loading