moirae*
is a light-weight async workflow execution engine supports static DAGs in python
.
*) Moirae are Acient Greek gods who ensure that every being, mortal and divine, lived out their destiny as it was assigned to them by the laws of the universe.
moirae
requires python>=3.8
.
*) In Windows
, please use python>=3.9
because of a bug related to asyncio
from python standard library.
pip install moirae
git clone https://github.com/lzqlzzq/moirae
pip install -e moirae
Node is a basic unit in workflow where the data be transformed. A Node
class must inherit from moirae.Node
, which is a subclass of Pydantic BaseModel. The defination of a Node
class must include:
- An
Input
class: Must inherit frommoirae.Data
. Defines the input data of the node.moirae.Data
is also a subclass ofpydantic.BaseModel
, so it behaves almost the same aspydantic.BaseModel
. It should also supportmsgpack
protocol for serializing and hashing. - An
Output
class: Must inherit frommoirae.Data
. Defines the output data of the node.moirae.Data
is also a subclass ofpydantic.BaseModel
, so it behaves almost the same aspydantic.BaseModel
. Attention the inputs and outputs of the node will be checked bymoirae
. It should also supportmsgpack
protocol for serializing and hashing. - An
execute
fuction: Must be anasync
function. Defines how the input data will be transformed to output data. AnInput
instance will be passed into theexecute
function. The arguments of the node can be accessed byself
. - Optional
arguments
: Define arguments for data transformation inexecute
.moirae.Node
itself is a subclass ofpydantic.BaseModel
, so it behaves almost the same aspydantic.BaseModel
. Attention the inputs and outputs of the node will be checked bymoirae
. It should also supportmsgpack
protocol for serializing and hashing.
import moirae
class AddMul(moirae.Node):
# Define input of this node
class Input(moirae.Data):
x: float
y: float
# Define output of this node
class Output(moirae.Data):
o: float
# Define arguments of this node
coef: int
# Define execute of this node, write your logic here
async def execute(self, inputs: Input) -> Output:
added = inputs.x + inputs.y # Make use of the input
multiplied = self.coef * added # Make use of the node' s argument
result = self.Output(o=multiplied) # Must return a self.Output
return result
Then, it would be registered in moirae.NODES
:
print(moirae.NODES) # {'AddMul': <class '__main__.AddMul'>}
You can eagerly execute the node. The result would be returned as the node is successfully executed. Moirea
will check if the output is match Node.Output
.
# Initialize node instance
add_mul_instance1 = AddMul(coef=2.)
# Initialize input instance of the node
input = AddMul.Input(x=1, y=2)
# Eagar execute the node
output = add_mul_instance1(input)
print(type(output)) # <class '__main__.AddMul.Output'>
print(output.o) # o=6.0
Let' define two types of Node
.
class Add(moirae.Node):
class Input(moirae.Data):
x: float
y: float
class Output(moirae.Data):
o: float
async def execute(self, inputs: Input):
await asyncio.sleep(1) # Simulate running time
return self.Output(o=inputs.x + inputs.y)
class Multiply(moirae.Node):
class Input(moirae.Data):
x: float
y: float
class Output(moirae.Data):
o: float
async def execute(self, inputs: Input):
await asyncio.sleep(2) # Simulate running time
return self.Output(o=inputs.x * inputs.y)
We can build a simple graph with three nodes. The graph is a dict[node_name: str, node_attr: dict]
.
These attributes must be in node_attr
:
node
: The class name of themoirae.Node
your defined.arguments
: Arguments of the node.inputs
: The inputs data.${node_name.node_output_variable_name}
will define a data flow in the graph.
graph = {
'a': {
'node': 'Add',
'arguments': {},
'inputs': {
'x': 1, 'y': 2 # a.o = (1 + 2)
}
},
'b': {
'node': 'Multiply',
'arguments': {},
'inputs': {
'x': 3, 'y': 2 # b.o = (3 * 2)
}
},
'c': {
'node': 'Add',
'arguments': {},
'inputs': {
'x': '${b.o}', 'y': '${a.o}' # c.o = (b.o + a.o)
}
}
}
You can show the computation graph:
mg = moirae.Graph(graph)
print(mg.graph.nodes(data=True))
print(mg.graph.edges(data=True))
"""
inputs_schema: {}
args_schema: {'a': {'x': FieldInfo(annotation=float, required=True), 'y': FieldInfo(annotation=float, required=True)}, 'b': {'x': FieldInfo(annotation=float, required=True), 'y': FieldInfo(annotation=float, required=True)}}
outputs_schema: {'a': <class '__main__.Add.Output'>, 'b': <class '__main__.Multiply.Output'>, 'c': <class '__main__.Add.Output'>}
input_data: {'a': {'x': 1, 'y': 2}, 'b': {'x': 3, 'y': 2}, 'c': {'x': None, 'y': None}}
nodes: [('a', {'node': Add(), 'hash': 'bbdadddac55732cd29aa32d15e88cabdba9d9b064336105838fc57916d8e89a9e208f0b7ddd09d1946093f576f864da3ccf5660d5a25f8bf8ac1faf434acc97d'}), ('b', {'node': Multiply(), 'hash': '84fa63db9acf5dbfd94dd535e61cdcf7cdf8e39046d35a3091e299b1d1663c72df48b8f7a123f7434547a9528c5012021ce6d7b1325435d2a93e5c5f57609f20'}), ('c', {'node': Add(), 'hash': '8a2a9794af9da740059c6e92eed17ff70445babad56dffe646c110b6069903bf563b203feae86ba4fcc00fafb5b80b18da7b82e50a45aecfc4d2499863013bd4'})]
edges: [('a', 'c', {'output_field': 'o', 'input_field': 'y'}), ('b', 'c', {'output_field': 'o', 'input_field': 'x'})]
"""
Or visualize it with networkx
and matplotlib
:
import networkx as nx
import matplotlib.pyplot as plt
nx.draw(mg.graph, with_labels=True)
plt.show()
moirae
implements a async flow executor. All Node
s can run as soon as its prerequisites fulfilled without any waiting.
async def run_graph():
print(f'[{time()}]: Start executing.')
async with moirae.Executor(mg) as exe:
async for (node_name, node_output) in exe:
print(f'[{time()}]{node_name}: {node_output}')
print(f'[{time()}]: Finish executing.')
if __name__ == "__main__":
asyncio.run(run_graph())
# [1729154400.537708]: Start executing.
# [1729154401.5393991]a: o=3.0
# [1729154403.541043]b: o=6.0
# [1729154404.5419276]c: o=9.0
# [1729154404.5420897]: Finish executing.
You can also use moirae.execute
directly to execute the whole graph eagerly.
print(f'[{time()}]: Start executing.')
print(moirae.execute(mg))
print(f'[{time()}]: Finish executing.')
# [1729492804.0473106]: Start executing.
# {'a': Output(o=3.0), 'b': Output(o=6.0), 'c': Output(o=9.0)}
# [1729492808.0519385]: Finish executing.
moirae
provides a cache mechanism based on topological hashing for storing intermediate results. If the cache hits, moirae
will try to fetch the data from the cache, avoiding re-run the node. Implement a moirae.Cache
class like this:
import os
import aiofiles # pip install aiofiles
class FileCache(moirae.Cache):
def __init__(self, root_dir: str):
self.root_dir = root_dir
async def exists(self, hash_key: str):
return os.path.exists(os.path.join(self.root_dir, hash_key))
async def get(self, hash_key: str):
async with aiofiles.open(os.path.join(self.root_dir, hash_key), mode='rb') as f:
return await f.read()
async def put(self, hash_key: str, data_value: bytes):
async with aiofiles.open(os.path.join(self.root_dir, hash_key), mode='wb') as f:
await f.write(data_value)
These three async method: exists
, get
, put
must be implemented for a moirae.Cache
class.
And execute with cache
argument:
async def execute_graph_async():
mg = moirae.Graph(graph)
print(f'[{time()}]: Start executing.')
async for (node_name, node_output) in moirae.execute_async(mg, FileCache(".")):
print(f'[{time()}]{node_name}: {node_output}')
print(f'[{time()}]: Finish executing.')
if __name__ == "__main__":
print('Testing execute without cache.')
asyncio.run(execute_graph_async())
print('Cache is stored!')
asyncio.run(execute_graph_async())
Will output:
Testing execute without cache.
[1729241948.6246948]: Start executing.
[1729241949.6267285]a: o=3.0
[1729241951.6277223]b: o=6.0
[1729241952.6284506]c: o=9.0
[1729241952.6289535]: Finish executing.
Cache is stored!
[1729241952.6298018]: Start executing.
[1729241952.6310601]b: o=6.0
[1729241952.6312633]a: o=3.0
[1729241952.6314924]c: o=9.0
[1729241952.6403558]: Finish executing.
The cache is stored at second run. So moirae
directly fetch outputs from cache instead of running the node.
Remember we defined Add
node costs 1 second, Multiply
costs 3 seconds. For example if we modify the input of node a
, it will reuse the output of node b
, only execute node a
and c
, thus only costs 2 seconds.
- Complete unit tests
- Implement subgraph execution