-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathtasks.tex
406 lines (347 loc) · 20.5 KB
/
tasks.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
\chapter{Tasks}
\label{chap:tasks}
% Set up the listing configuration for everything
\lstset{
captionpos=b,
language=C++,
basicstyle=\scriptsize,
numbers=left,
numberstyle=\tiny,
columns=fullflexible,
stepnumber=1,
escapechar=\#,
keepspaces=true,
literate={<}{{$\langle$}}1 {>}{{$\rangle$}}1,
%xleftmargin=15pt,
}
The Legion runtime is a \Cpp\ library, and
Legion programs are just C++ programs that use the Legion runtime API.
One important consequence of this design is that almost all Legion decisions
(such as what data layout to use, in which memories to place data and on which
processors to run computations) are made dynamically, during the execution of
a Legion application. Dynamic decision making provides maximum flexibility,
allowing the runtime's decisions to be reactive to the current state of the computation.
Implementing Legion as a \Cpp\ library also allows high performance \Cpp\ code
(e.g., vectorized kernels) to be used seamlessly in Legion applications.
In Legion, {\em tasks} are distinguished functions with a specific signature.
Legion tasks have several important properties:
\begin{itemize}
\item Tasks are the unit of parallelism in Legion; all parallelism occurs because tasks are executed in parallel.
\item Tasks have {\em variants} specific to a particular kind of {\em processor} (most commonly CPUs or GPUs, but there is also experimental support for FPGAs) and
memory layout of the task's arguments. A task may have multiple variants.
\item Once a task begins execution on a processor, that task will execute in its entirety on that processor---tasks do
not migrate mid-computation.
\end{itemize}
\begin{figure}
{\small
\lstinputlisting{Examples/Tasks/sum/sum.cc}
}
\caption{\legionbook{Tasks/sum/sum.cc}}
\label{fig:simple}
\end{figure}
Figure~\ref{fig:simple} shows a very simple, but complete, Legion program for summing
the first 1000 positive integers (also available as {\tt sum.cc} in \legionbook{Tasks}).
This example, like every other example in this manual, can be run by first setting an environment variable to point to the Legion runtime
\begin{verbatim}
export LG_RT_DIR="...path to legion directory.../legion/runtime"
\end{verbatim}
and then typing {\tt make} in the directory containing the example.
At a high level, every Legion program has three components:
\begin{itemize}
\item The id of the top-level task must be set with Legion's {\em high level runtime}. The top-level
task is the initial task that is called when the Legion runtime starts.
\item Every task and its task id must be registered with the high level runtime. Currently all tasks
must be registered before the runtime starts.
\item The start method of the high level runtime is invoked, which in turn calls the top-level task.
Note that by default this call does not return---the program is terminated when the start method terminates.
\end{itemize}
In Figure~\ref{fig:simple}, these three steps are the three statements of {\tt main}.
The only task in this program is {\tt sum\_task}, which is also the top-level task invoked when the
Legion runtime starts up. Note that the program does not say where the task is executed; that decision is made
at runtime by the {\em mapper} (see Chapter~\ref{chap:mapping}). Note also that tasks can perform almost arbitrary
\Cpp\ computations. In the case of {\tt sum\_task}, the computation performed is very simple, but in general tasks
can call ordinary \Cpp\ functions, including allocating and deallocating memory. Tasks must not, however,
call directly into other packages that provide parallelism or concurrency. Interoperation with OpenMP and MPI is
possible but must be done in a standardized way (see Chapter~\ref{chap:interop}).
As mentioned above, every task must be registered with the Legion runtime before
the runtime's {\tt start} method is called. Registration passes several arguments about a
task to the runtime:
\begin{itemize}
\item The name of the task is a template argument to the {\tt register\_legion\_task} method.
\item The task ID is the first (regular) argument.
\item The kind of processor the task can run on is the second argument. The most important options are
{\em latency optimized cores} or CPUs (constant {\tt LOC}) and {\em throughput optimized cores} or GPUs
(constant {\tt TOC}). Declaring the processor kind for a task is an example of a {\em constraint}.
Legion has an extensive system of constraints that can be used to direct the Legion runtime
in running a Legion program. There are other kinds of constraints that can be specified for tasks, but
the processor kind is the most commonly used.
\item Two boolean flags, the first of which indicates whether the task can be used in a single task
launch and the second of which indicates whether the task can be used in a multiple (or {\em index}) task
launch.
\end{itemize}
We will see shortly that tasks can call other tasks and pass
those tasks arguments and return results. Because the called task may
be executed in a different address space than the caller, arguments
passed between tasks must not contain \Cpp\ pointers, as these will
not make sense outside of the address space in which they were
created. Neither should tasks refer to global variables. A common
programming error for beginning Legion programmers is to pass
\Cpp\ pointers or references between tasks, or to refer to global
variables from within tasks. As long as all the tasks are mapped to a
single node (i.e., the same address space) the program is likely to
work, but when efforts are made to scale up the application by running
on multiple nodes, \Cpp\ crashes result from the wild pointers or
references to distinct instances of global variables of the same name
in different address spaces. Legion provides its own abstractions for
passing data structures between tasks (see
Chapter~\ref{chap:regions}).
All tasks have the same input signature as {\tt sum\_task}:
\begin{itemize}
\item {\tt const Task *task}: An object representing the task itself.
\item {\tt const std::vector<PhysicalRegion> \®ions}: A vector of {\em physical region instances}. This argument is the
primary way to pass data between tasks (see Chapter~\ref{chap:regions}).
\item {\tt Context ctx}: Every task is called in a context, which contains metadata for the task. Application programs
should not directly manipulate the context.
\item {\tt Runtime *runtime}: A pointer to the runtime, which gives the task access to the Legion runtime's methods.
\end{itemize}
\section{Subtasks}
\label{sec:subtasks}
\begin{figure}
{\small
\lstinputlisting{Examples/Tasks/subtasks/subtasks.cc}
}
\caption{\legionbook{Tasks/subtasks/subtasks.cc}}
\label{fig:subtask}
\end{figure}
Task can call other tasks, known as {\em subtasks}. We also refer to
the calling task as the {\em parent task} and the called task as the
{\em child task}. Two or more child tasks of the same parent task are
{\em sibling tasks}. Figure~\ref{fig:subtask} shows the definition of
the parent task and the child task from the example
\legionbook{Tasks/subtask/subtask.cc}.
Consider the parent task {\tt top\_level\_task}. There are two steps
to executing a subtask. First, a {\tt TaskLauncher} object is
created. The {\tt TaskLauncher}
constructor takes two arguments, the ID of the task to be called and a
{\tt TaskArgument} object that holds a pointer to a buffer containing
data for the subtask together with the size of the buffer. The
semantics of the task arguments are particularly important. Recall
that a task may be run on any processor in the system (of a kind that
can execute the task). Thus, the parent task and the child task may
run in different address spaces, and so the arguments are passed
{\em by value}, meaning that the buffer pointed to by the {\tt TaskArgument} is
copied to where the subtask runs. Even if the subtask happens to run in the
same address space as the parent task, the buffer referenced by the {\tt
TaskArgument} is passed by value (i.e., copied).
{\tt TaskArgument} objects should be used to pass small amounts of data,
such as an integer, float, struct or a (very) small array. To pass large amounts of
data, use {\em regions} (see Chapter~\ref{chap:regions}). As
discussed earlier in this chapter, task arguments may not contain
\Cpp\ pointers or references. In addition, task arguments may not contain
futures (see Section~\ref{sec:futures}).
A subtask is actually launched by the {\tt runtime->execute\_task}
method, which requires both the parent task's context and the {\tt
TaskLauncher} object for the subtask as arguments. Note that
the argument buffer pointed to by the {\tt TaskArgument} is copied
only when {\tt execute\_task} is called. On the callee's side, note
that the task arguments are available as a field of the {\tt task}
object. Since \Cpp\ doesn't know the type of the buffer, it is
necessary to first cast the pointer to the buffer to the correct type
before it can be used.
Finally, there are two other important properties of subtasks. First,
the {\tt execute\_task} method is {\em non-blocking}, meaning it
returns immediately and the subtask is executed asynchronously from
the parent task, allowing the parent task to continue executing while
the subtask is running (potentially) in parallel. In {\tt
subtask.cc}, the parent task launches all of the subtasks in a loop,
sending each subtask a unique integer argument that the subtask simply prints
out. Compile and run {\tt subtask.cc} and observe that the
parent task reports that it is done launching all of the subtasks
before all of the subtasks execute. Second, a parent task does not
terminate until all of its child tasks have terminated. Thus, even
though {\tt top\_level\_task} reaches the end of its function body
before all of its child tasks have completed, at that point the parent
task waits until all the child tasks terminate, at which point
{\tt top\_level\_task} itself terminates.
\section{Futures}
\label{sec:futures}
\begin{figure}
{\small
\lstinputlisting[linerange={14-48}]{Examples/Tasks/futures/futures.cc}}
\caption{\legionbook{Tasks/futures/futures.cc}}
\label{fig:futures}
\end{figure}
In addition to taking arguments, subtasks may also return results.
However, because a subtask executes asynchronously from its parent
task, there is no guarantee that the result of the subtask will be
available when the parent task or another task attempts to use it. A
standard solution to this problem is to provide {\em futures}. A future
is a value that, if read, causes the task that is performing the
read to block if necessary until the value is available.
Figure~\ref{fig:futures} shows an excerpt from {\tt futures.cc}, which
is an extension of {\tt substask.cc} from
Section~\ref{sec:subtasks}. In this example, there are two subtasks,
a producer and a consumer. The top level task repeatedly calls
\mbox{producer/consumer} pairs in a loop. The top level task first calls the
producer task, passing it a unique odd integer, which the producer
prints out. The producer returns a unique even integer as a future.
The top level task then passes this future to a consumer task that
reads and prints the number.
The launch of the producer task is exactly as before in
Figure~\ref{fig:subtask}. Unlike in that example, however, the
producer subtask has a non-void return value, and so the {\tt
runtime->execute\_task} invocation returns a useful result of type
{\tt Future}. Note that the future is passed to the consumer task
using the {\tt add\_future} method of the {\tt TaskLauncher} class, not
through the {\tt TaskArgument} object used to construct the {\tt
TaskLauncher}; futures must always be passed as arguments using {\tt add\_future}
and must not be included in {\tt TaskArgument}s. Having a distinguished method for tracking arguments
to tasks that are futures allows the Legion runtime to track
{\em dependencies} between tasks. In this case, the Legion runtime
will know that the consumer task depends on the result of the corresponding
producer task.
Legion gives access to the value of a future through the {\tt
get\_result} method of the {\tt Future} class, as shown in the code
for {\tt subtask\_consumer} in Figure~\ref{fig:futures}. (Note that
{\tt get\_result} is templated on the type of value the future holds.)
There are two interesting cases of tasks reading from futures:
\begin{itemize}
\item If a parent task attempts to access a future returned
by one of its child tasks that has not yet completed, the parent task
will block until the value of the future is available. This behavior is the
standard semantics for futures, as described above. In Legion, however,
this style of programming is discouraged, as blocking operations are
generally detrimental to achieving the highest possible performance.
\item Figure~\ref{fig:futures} illustrates idiomatic use of futures in Legion:
a future returned by one subtask is passed as an argument to another subtask.
Because Legion knows the consumer task depends on the producer task, the consumer
task will not be run by the Legion runtime until the producer task has terminated.
Thus, all references to the future in the consumer task are guaranteed to
return immediately, without blocking.
\end{itemize}
\section{Points, Rectangles and Domains}
Up to this point we have discussed individual tasks. Legion also provides mechanisms for
naming and launching sets of tasks. The ability to name and manipulate sets of things, and
in particular sets of points, is useful for
more than dealing with sets of tasks, and so we first present the general mechanism in
Legion for defining {\em points}, {\em rectangles} and {\em domains}.
A {\em point} is an n-tuple of integers. The {\tt Point} constructor, which is templated on the
dimension $n$, is used to create points:
\begin{verbatim}
Point<1> one(1); // The 1 dimensional point <1>
Point<1> two(2); // The 1 dimensional point <2>
Point<2> zeroes(0,0); // The 2 dimensional point <0,0>
Point<2> twos(2,2); // The 2 dimensional point <2,2>
Point<2> threes(3,3); // The 2 dimensional point <3,3>
Point<3> fours(4,4,4); // The 3 dimensional point <4,4,4>
\end{verbatim}
There are many operations defined on points. For example, points can be summed:
\begin{verbatim}
twos + threes // the point <5,5>
\end{verbatim}
and one can take the dot product of two points:
\begin{verbatim}
twos.dot(threes) // the integer 12
\end{verbatim}
The following are true:
\begin{verbatim}
twos == twos
twos != threes
\end{verbatim}
A pair of points $a$ and $b$ defines a {\em rectangle} that includes all the points that are greater than or equal to $a$
and less than or equal to $b$. For example:
\begin{verbatim}
// the points <0,0> <0,1> <0,2> <0,3>
// <1,0> <1,1> <1,2> <1,3>
// <2,0> <2,1> <2,2> <2,3>
// <3,0> <3,1> <3,2> <3,3>
Rect<2> big(zeroes,threes);
// the points <2,2> <2,3>
// <3,2> <3,3>
Rect<2> small(twos,threes);
\end{verbatim}
There are also many operations defined on rectangles. A few examples, all of which evaluate to true:
\begin{verbatim}
big != small
big.contains(small)
small.overlaps(big)
small.intersection(big) == small
\end{verbatim}
Note that the intersection of two rectangles is always a rectangle.
A {\em domain} is an alternative type for rectangles. A {\tt Rect} can be converted to a {\tt Domain}:
\begin{verbatim}
Domain bigdomain = big;
\end{verbatim}
The difference between the two types is that {\tt Rect}s are templated on the dimension of the rectangle, while {\tt Domains}
are not. Legion runtime methods generally take {\tt Domain} arguments and use {\tt Domain}s internally, but for application
code the extra type checking provided by the {\tt Rect} type (which ensures that the operations are applied to {\tt Rect} arguments
with compatible dimensions) is useful. The recommended programming style is to create {\tt Rect}s and convert them to {\tt Domain}s
at the point of a Legion runtime call. Most of these type conversions will be handled implicitly---the programmer usually does not need to explicitly cast
a {\tt Rect} to a {\tt Domain}. It is also possible to work directly with the {\tt Domain} type, which has many
of the same methods as {\tt Rect} (see {\tt lowlevel.h} in the {\tt runtime/} directory).
Analagous to {\tt Rect} and {\tt Domain}, there is a less-typed version of the type {\tt Point} called {\tt DomainPoint}.
Again, the difference between the two types is that the {\tt Point} class is templated on the number of dimensions
while {\tt DomainPoint} is not. For Legion methods that require a {\tt DomainPoint}, there is a function to convert a
{\tt Point}:
\begin{verbatim}
DomainPoint dtwos = twos;
\end{verbatim}
As before, most Legion runtime calls take {\tt DomainPoints}, but programmers should probably prefer using the {\tt Point} type
for the extra type checking provided.
The example program \legionbook{Tasks/domains/domains.cc} includes all of the examples in this section and more.
\section{Index Launches}
\label{sec:indexlaunch}
We now return to the Legion mechanisms for launching multiple tasks in a
single operation. The main reason for using such {\em index launches}
is efficiency, as the overhead of starting $n$ tasks with a single
call is much less than launching $n$ separate tasks, and the
difference in performance only grows with $n$. Thus, when launching
even tens of tasks, an index launch should be used if possible. Not
all sets of tasks can be initiated using an index launch;
index launches are for executing multiple instances of the same task
where all of the task instances can run in parallel.
\begin{figure}
{\small
\lstinputlisting[linerange={47-90}]{Examples/Tasks/indexlaunch/indexlaunch.cc}}
\caption{\legionbook{Tasks/indexlaunch/indexlaunch.cc}}
\label{fig:indexlaunch}
\end{figure}
Figure~\ref{fig:indexlaunch} implements the same computation as the example in
Figure~\ref{fig:futures}, but instead of launching a single
producer and consumer pair at a time, in Figure~\ref{fig:indexlaunch}
all of the producers are launched in a single Legion runtime call,
followed by another single call to launch all of the consumers.
We now work through this example in detail, as it introduces several
new Legion runtime calls. First a one dimensional {\tt Rect}
{\tt launch\_domain} is created with the points {\tt 1..points}, where
{\tt points} is set to 50. Note that while the application code uses {\tt Rects} and {\tt Points} that the signatures of the runtime interfaces that are called use {\tt Domains} and {\tt DomainPoints} and Legion takes care of the conversions.
When launching multiple tasks simultaneously, we need some way to
describe for each task what argument it should receive. There are two
kinds of arguments that Legion supports: arguments that are common to
all tasks (i.e., the same value is passed to all the tasks) and
arguments that are specific to a particular task.
Figure~\ref{fig:indexlaunch} illustrates how to pass a (potentially)
different argument to each subtask. An {\tt ArgumentMap} maps a point
(specifically, a {\tt DomainPoint}) $p$ in the task index space to an
argument for task $p$. In the figure, the {\tt ArgumentMap} maps $p$
to $2p$. Note that an {\tt ArgumentMap} does not need to name an argument
for every point in the index space.
The procedure for launching a set of tasks is analogous to launching a
single task. Following standard Legion practice, we first create a
class derived from {\tt IndexLauncher} for each kind of task we will use in an
index launch. These classes, {\tt ProducerTasks} and {\tt ConsumerTasks} in this
example, encapsulate all of the information about the index task launch that is the
same across all calls (e.g., the task id to be launched). The {\tt ProducerTasks}
index launcher takes the launch domain and an argument map.
Executing the {\tt runtime->execute\_index\_space} method invokes all of the tasks
in the launch domain.
The {\tt execute\_task\_space} for the producer tasks returns not
a single {\tt Future}, but a {\tt FutureMap}, which maps each point in the index
space to a {\tt Future}. Figure~\ref{fig:indexlaunch} shows one way to
use the {\tt FutureMap} by converting it to an {\tt ArgumentMap} that is passed to
the index launch for the consumer tasks. Note that the launch of the consumer subtasks
does not block waiting for all of the futures to be resolved; instead, each consumer subtask
runs only after the future it depends on is resolved.
The subtask definitions are straightforward. Note that the argument specific to the subtask is
in the field {\tt task->local\_args}. Also note that when the consumer task actually runs
the argument is not a future, but a fully evaluated {\tt int}.