Skip to content
Chris Lu edited this page Oct 24, 2015 · 1 revision

Preparing the execution plan

Let's say you have a "hello_world.go" file, and compiled into a "hello_word" binary file.

When "hello_word -glow" starts, it runs locally on your computer (duh...), and it will try to run a flow. It generates an optimized execution plan by merging steps, and divides the steps into task groups based on the number of partitions.

Fetch resources from master

A task group needs to run on an executor. So the driver program will ask the master for available executors.

The resources, cpu cores and memory, are reported to the master by agents through heartbeats.

However the master may not always give the exact number of executors that the driver is asking for, due to resource limitation, competition with other driver programs, etc. But since all communication cross servers are pull-based and asynchronous, glow runs fine with limited resources. In extreme case, a flow can proceed even with just 1 single executor.

And even some executors start to run, the driver program will still request resources from the master if the compute demand is still high.

If some tasks are finished early, the executors are re-used directly instead of return and request again.

Assign tasks to a server

For each allocated server, the driver program will assign a group of tasks to the agent that manages the server. Then the agent replicates the driver binary file if not cached already, and starts to run the binary file in "task" mode, by specifying "-glow.flow.id", "-glow.taskGroup.id", etc. The driver also tells the task group about the input dataset shard locations.

The driver will try to run tasks that have inputs ready or in computing process. So if task group 3 depends on outputs of task group 5, task group 5 will run first.

Task group runs on the server

Since all flows are statically defined, the tasks to run are deterministic given flow.id and taskGroup.id, and input dataset shard locations. The output is written to local agent. The inputs are retrieved from the location served by the remote agents.

Input Channels and Output Channels

A flow can also have input channels and output channels, through which the driver program will send and receive data. Ideally you should not pump a lot of data through these channels, but just data locations or succinct final results.

Clone this wiki locally