-
Notifications
You must be signed in to change notification settings - Fork 247
Home
In a glow program, datasets are linked together by computation steps. Each dataset can be partitioned into dataset shards. Each computation step can also be partitioned into computation tasks.
Glow make full use of Golang's channel feature.
In local mode, data are fed into computation tasks by input channel(s), and output also via a channel to a new dataset.
In distributed mode, a group of tasks will run together in a server, pulling its own input dataset shards. Its output are streamed to local disk, which will be pulled by downstream tasks. But these piping work are not affecting the computation since the tasks only input and output via channels.
One of the Golang's missing feature is the capability to move execution closure across the network, and not likely available any time soon. Given current situation, we can just move the whole binary code, but run in different modes, i.g., task mode and driver mode.
However, the implication is that the computation flow will be static. The flow graph can not be changed. One future way to allow dynamic flow is to pre-register all the flows, and dynamically choose one or several flows to run.
The driver program will generate an optimized execution plan, request resources from the glow master, and allocate tasks to assigned servers. The agents on the assigned servers will get the task request, fetch the binary code, plumbing the channels, execute the task, and save local outputs if any.
All glow agents send the reports of their cpu, memory, current allocations, etc, to glow master via heartbeats. Glow master will then assign available resources to each driver program.
Check this page first. https://github.com/chrislusf/glow_examples/tree/master/word_count
Fork it, code it, and send pull requests. Better first discuss about the feature you want on the mailing list. https://groups.google.com/forum/#!forum/glow-user-discussion