Skip to content

Commit

Permalink
updated unittest for dep_async
Browse files Browse the repository at this point in the history
  • Loading branch information
twhuang committed Jun 13, 2023
1 parent a0e218c commit 9316d98
Show file tree
Hide file tree
Showing 24 changed files with 649 additions and 133 deletions.
6 changes: 3 additions & 3 deletions docs/DependentAsyncTasking.html
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ <h3>Contents</h3>
<li><a href="#SpecifyARagneOfDependentAsyncTasks">Specify a Range of Dependent Async Tasks</a></li>
<li><a href="#UnderstandTheLifeTimeOfADependentAsyncTask">Understand the Lifetime of a Dependent Async Task</a></li>
<li><a href="#CreateADynamicTaskGraphByMultipleThreads">Create a Dynamic Task Graph by Multiple Threads</a></li>
<li><a href="#QueryTheComppletionStatusOfADependentAsyncTask">Query the Completion Status of a Dependent Async Task</a></li>
<li><a href="#QueryTheComppletionStatusOfDependentAsyncTasks">Query the Completion Status of Dependent Async Tasks</a></li>
</ul>
</nav>
<p>This chapters discusses how to create a task graph dynamically using asynchronous tasks, which is extremely beneficial for workloads that want to (1) explore task graph parallelism out of dynamic control flow or (2) overlap task graph creation time with individual task execution time. We recommend that you first read <a href="AsyncTasking.html" class="m-doc">Asynchronous Tasking</a> before digesting this chapter.</p><section id="CreateADynamicTaskGraph"><h2><a href="#CreateADynamicTaskGraph">Create a Dynamic Task Graph</a></h2><p>When the construct-and-run model of a task graph is not possible in your application, you can use <a href="classtf_1_1Executor.html#aee02b63d3a91ad5ca5a1c0e71f3e128f" class="m-doc">tf::<wbr />Executor::<wbr />dependent_async</a> and <a href="classtf_1_1Executor.html#a0e2d792f28136b8227b413d0c27d5c7f" class="m-doc">tf::<wbr />Executor::<wbr />silent_dependent_async</a> to create a task graph dynamically. This type of parallelism is also known as <em>on-the-fly</em> task graph parallelism, which offers great flexibility for expressing dynamic task graph parallelism. The example below dynamically creates a task graph of four dependent async tasks, <code>A</code>, <code>B</code>, <code>C</code>, and <code>D</code>, where <code>A</code> runs before <code>B</code> and <code>C</code> and <code>D</code> runs after <code>B</code> and <code>C:</code></p><div class="m-graph"><svg style="width: 24.200rem; height: 9.800rem;" viewBox="0.00 0.00 242.00 98.00">
Expand Down Expand Up @@ -147,7 +147,7 @@ <h3>Contents</h3>

<span class="n">executor</span><span class="p">.</span><span class="n">wait_for_all</span><span class="p">();</span><span class="w"></span>
<span class="n">t1</span><span class="p">.</span><span class="n">join</span><span class="p">();</span><span class="w"></span>
<span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span><span class="w"></span></pre><p>Regardless of <code>t1</code> runs before or after <code>t2</code>, the resulting topological order is always correct with the graph definition, either <code>ABC</code> or <code>ACB</code>.</p></section><section id="QueryTheComppletionStatusOfADependentAsyncTask"><h2><a href="#QueryTheComppletionStatusOfADependentAsyncTask">Query the Completion Status of a Dependent Async Task</a></h2><p>When you create a dependent async task, you can query its completion status by <a href="classtf_1_1AsyncTask.html#aefeefa30d7cafdfbb7dc8def542e8e51" class="m-doc">tf::<wbr />AsyncTask::<wbr />is_done</a>, which returns <code>true</code> upon completion or <code>false</code> otherwise. A completed dependent async task indicates that a worker has executed its associated callable.</p><pre class="m-code"><span class="c1">// create a dependent async task that returns 100</span>
<span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span><span class="w"></span></pre><p>Regardless of <code>t1</code> runs before or after <code>t2</code>, the resulting topological order is always correct with the graph definition, either <code>ABC</code> or <code>ACB</code>.</p></section><section id="QueryTheComppletionStatusOfDependentAsyncTasks"><h2><a href="#QueryTheComppletionStatusOfDependentAsyncTasks">Query the Completion Status of Dependent Async Tasks</a></h2><p>When you create a dependent async task, you can query its completion status by <a href="classtf_1_1AsyncTask.html#aefeefa30d7cafdfbb7dc8def542e8e51" class="m-doc">tf::<wbr />AsyncTask::<wbr />is_done</a>, which returns <code>true</code> upon completion or <code>false</code> otherwise. A completed dependent async task indicates that a worker has executed its associated callable.</p><pre class="m-code"><span class="c1">// create a dependent async task that returns 100</span>
<span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">task</span><span class="p">,</span><span class="w"> </span><span class="n">fu</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">executor</span><span class="p">.</span><span class="n">dependent_async</span><span class="p">([](){</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w"> </span><span class="p">});</span><span class="w"></span>

<span class="c1">// loops until the dependent async task completes</span>
Expand All @@ -167,7 +167,7 @@ <h3>Contents</h3>
<span class="p">};</span><span class="w"></span>

<span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">task</span><span class="p">,</span><span class="w"> </span><span class="n">fib11</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">executor</span><span class="p">.</span><span class="n">dependent_async</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="n">fibonacci</span><span class="p">,</span><span class="w"> </span><span class="mi">11</span><span class="p">));</span><span class="w"></span>
<span class="n">assert</span><span class="p">(</span><span class="n">fib11</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">89</span><span class="p">);</span><span class="w"> </span><span class="c1">// the 11-th Fibonacci number is 89</span></pre><p>{.cpp}</p></section>
<span class="n">assert</span><span class="p">(</span><span class="n">fib11</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">89</span><span class="p">);</span><span class="w"> </span><span class="c1">// the 11-th Fibonacci number is 89</span></pre></section>
</div>
</div>
</div>
Expand Down
103 changes: 103 additions & 0 deletions docs/classtf_1_1FlowBuilder.html
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,17 @@ <h2><a href="#pub-methods">Public functions</a></h2>
P&amp;&amp; part = P()) -&gt; <a href="classtf_1_1Task.html" class="m-doc">Task</a></span>
</dt>
<dd>constructs an STL-styled parallel transform-reduce task</dd>
<dt>
<div class="m-doc-template">template&lt;typename B1, typename E1, typename B2, typename T, typename BOP_R, typename BOP_T, typename P = <a href="classtf_1_1GuidedPartitioner.html" class="m-doc">GuidedPartitioner</a>&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#a7099ef62158a6e0770bc8ceef1961326" class="m-doc">transform_reduce</a>(</span><span class="m-doc-wrap">B1 first1,
E1 last1,
B2 first2,
T&amp; init,
BOP_R bop_r,
BOP_T bop_t,
P&amp;&amp; part = P()) -&gt; <a href="classtf_1_1Task.html" class="m-doc">Task</a></span>
</dt>
<dd>constructs an STL-styled parallel transform-reduce task</dd>
<dt>
<div class="m-doc-template">template&lt;typename B, typename E, typename D, typename BOP&gt;</div>
<span class="m-doc-wrap-bumper">auto <a href="#a1c2ace9290d83c2a006614a4d66ad588" class="m-doc">inclusive_scan</a>(</span><span class="m-doc-wrap">B first,
Expand Down Expand Up @@ -1129,6 +1140,98 @@ <h3>
</table>
<p>The task spawns asynchronous tasks to perform parallel reduction over <code>init</code> and the transformed elements in the range <code>[first, last)</code>. The reduced result is store in <code>init</code>. This method is equivalent to the parallel execution of the following loop:</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">itr</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="o">!=</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">init</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bop</span><span class="p">(</span><span class="n">init</span><span class="p">,</span><span class="w"> </span><span class="n">uop</span><span class="p">(</span><span class="o">*</span><span class="n">itr</span><span class="p">));</span><span class="w"></span>
<span class="p">}</span><span class="w"></span></pre><p>Iterators are templated to enable stateful range using <a href="http://en.cppreference.com/w/cpp/utility/functional/reference_wrapper.html" class="m-doc-external">std::<wbr />reference_wrapper</a>.</p><p>Please refer to <a href="ParallelReduction.html" class="m-doc">Parallel Reduction</a> for details.</p>
</div></section>
<section class="m-doc-details" id="a7099ef62158a6e0770bc8ceef1961326"><div>
<h3>
<div class="m-doc-template">
template&lt;typename B1, typename E1, typename B2, typename T, typename BOP_R, typename BOP_T, typename P = <a href="classtf_1_1GuidedPartitioner.html" class="m-doc">GuidedPartitioner</a>&gt;
</div>
<span class="m-doc-wrap-bumper"><a href="classtf_1_1Task.html" class="m-doc">Task</a> tf::<wbr />FlowBuilder::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a7099ef62158a6e0770bc8ceef1961326" class="m-doc-self">transform_reduce</a>(</span><span class="m-doc-wrap">B1 first1,
E1 last1,
B2 first2,
T&amp; init,
BOP_R bop_r,
BOP_T bop_t,
P&amp;&amp; part = P())</span></span>
</h3>
<p>constructs an STL-styled parallel transform-reduce task</p>
<table class="m-table m-fullwidth m-flat">
<thead>
<tr><th colspan="2">Template parameters</th></tr>
</thead>
<tbody>
<tr>
<td style="width: 1%">B1</td>
<td>first beginning iterator type</td>
</tr>
<tr>
<td>E1</td>
<td>first ending iterator type</td>
</tr>
<tr>
<td>B2</td>
<td>second beginning iterator type</td>
</tr>
<tr>
<td>T</td>
<td>result type</td>
</tr>
<tr>
<td>BOP_R</td>
<td>binary reducer type</td>
</tr>
<tr>
<td>BOP_T</td>
<td>binary transformion type</td>
</tr>
<tr>
<td>P</td>
<td>partitioner type (default <a href="classtf_1_1GuidedPartitioner.html" class="m-doc">tf::<wbr />GuidedPartitioner</a>)</td>
</tr>
</tbody>
<thead>
<tr><th colspan="2">Parameters</th></tr>
</thead>
<tbody>
<tr>
<td>first1</td>
<td></td>
</tr>
<tr>
<td>last1</td>
<td></td>
</tr>
<tr>
<td>first2</td>
<td></td>
</tr>
<tr>
<td>init</td>
<td>initial value of the reduction and the storage for the reduced result</td>
</tr>
<tr>
<td>bop_r</td>
<td>binary operator that will be applied in unspecified order to the results of <code>bop_t</code></td>
</tr>
<tr>
<td>bop_t</td>
<td>binary operator that will be applied to transform each element in the range to the result type</td>
</tr>
<tr>
<td>part</td>
<td>partitioning algorithm to schedule parallel iterations</td>
</tr>
</tbody>
<tfoot>
<tr>
<th>Returns</th>
<td>a <a href="classtf_1_1Task.html" class="m-doc">tf::<wbr />Task</a> handle</td>
</tr>
</tfoot>
</table>
<p>The task spawns asynchronous tasks to perform parallel reduction over <code>init</code> and the transformed elements in the range <code>[first, last)</code>. The reduced result is store in <code>init</code>. This method is equivalent to the parallel execution of the following loop:</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">itr1</span><span class="o">=</span><span class="n">first1</span><span class="p">,</span><span class="w"> </span><span class="n">itr2</span><span class="o">=</span><span class="n">first2</span><span class="p">;</span><span class="w"> </span><span class="n">itr1</span><span class="o">!=</span><span class="n">last1</span><span class="p">;</span><span class="w"> </span><span class="n">itr1</span><span class="o">++</span><span class="p">,</span><span class="w"> </span><span class="n">itr2</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">init</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bop_r</span><span class="p">(</span><span class="n">init</span><span class="p">,</span><span class="w"> </span><span class="n">bop_t</span><span class="p">(</span><span class="o">*</span><span class="n">itr1</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">itr2</span><span class="p">));</span><span class="w"></span>
<span class="p">}</span><span class="w"></span></pre><p>Iterators are templated to enable stateful range using <a href="http://en.cppreference.com/w/cpp/utility/functional/reference_wrapper.html" class="m-doc-external">std::<wbr />reference_wrapper</a>.</p><p>Please refer to <a href="ParallelReduction.html" class="m-doc">Parallel Reduction</a> for details.</p>
</div></section>
<section class="m-doc-details" id="a1c2ace9290d83c2a006614a4d66ad588"><div>
Expand Down
Loading

0 comments on commit 9316d98

Please sign in to comment.