updated unittest for dep_async

shards-lang · Jun 13, 2023 · 9316d98 · 9316d98
1 parent a0e218c
commit 9316d98
Show file tree

Hide file tree

Showing 24 changed files with 649 additions and 133 deletions.
diff --git a/docs/DependentAsyncTasking.html b/docs/DependentAsyncTasking.html
@@ -56,7 +56,7 @@ <h3>Contents</h3>
  <li><a href="#SpecifyARagneOfDependentAsyncTasks">Specify a Range of Dependent Async Tasks</a></li>
  <li><a href="#UnderstandTheLifeTimeOfADependentAsyncTask">Understand the Lifetime of a Dependent Async Task</a></li>
  <li><a href="#CreateADynamicTaskGraphByMultipleThreads">Create a Dynamic Task Graph by Multiple Threads</a></li>
- <li><a href="#QueryTheComppletionStatusOfADependentAsyncTask">Query the Completion Status of a Dependent Async Task</a></li>
+ <li><a href="#QueryTheComppletionStatusOfDependentAsyncTasks">Query the Completion Status of Dependent Async Tasks</a></li>
  </ul>
  </nav>
 <p>This chapters discusses how to create a task graph dynamically using asynchronous tasks, which is extremely beneficial for workloads that want to (1) explore task graph parallelism out of dynamic control flow or (2) overlap task graph creation time with individual task execution time. We recommend that you first read <a href="AsyncTasking.html" class="m-doc">Asynchronous Tasking</a> before digesting this chapter.</p><section id="CreateADynamicTaskGraph"><h2><a href="#CreateADynamicTaskGraph">Create a Dynamic Task Graph</a></h2><p>When the construct-and-run model of a task graph is not possible in your application, you can use <a href="classtf_1_1Executor.html#aee02b63d3a91ad5ca5a1c0e71f3e128f" class="m-doc">tf::<wbr />Executor::<wbr />dependent_async</a> and <a href="classtf_1_1Executor.html#a0e2d792f28136b8227b413d0c27d5c7f" class="m-doc">tf::<wbr />Executor::<wbr />silent_dependent_async</a> to create a task graph dynamically. This type of parallelism is also known as <em>on-the-fly</em> task graph parallelism, which offers great flexibility for expressing dynamic task graph parallelism. The example below dynamically creates a task graph of four dependent async tasks, <code>A</code>, <code>B</code>, <code>C</code>, and <code>D</code>, where <code>A</code> runs before <code>B</code> and <code>C</code> and <code>D</code> runs after <code>B</code> and <code>C:</code></p><div class="m-graph"><svg style="width: 24.200rem; height: 9.800rem;" viewBox="0.00 0.00 242.00 98.00">
@@ -147,7 +147,7 @@ <h3>Contents</h3>
 
 <span class="n">executor</span><span class="p">.</span><span class="n">wait_for_all</span><span class="p">();</span><span class="w"></span>
 <span class="n">t1</span><span class="p">.</span><span class="n">join</span><span class="p">();</span><span class="w"></span>
-<span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span><span class="w"></span></pre><p>Regardless of <code>t1</code> runs before or after <code>t2</code>, the resulting topological order is always correct with the graph definition, either <code>ABC</code> or <code>ACB</code>.</p></section><section id="QueryTheComppletionStatusOfADependentAsyncTask"><h2><a href="#QueryTheComppletionStatusOfADependentAsyncTask">Query the Completion Status of a Dependent Async Task</a></h2><p>When you create a dependent async task, you can query its completion status by <a href="classtf_1_1AsyncTask.html#aefeefa30d7cafdfbb7dc8def542e8e51" class="m-doc">tf::<wbr />AsyncTask::<wbr />is_done</a>, which returns <code>true</code> upon completion or <code>false</code> otherwise. A completed dependent async task indicates that a worker has executed its associated callable.</p><pre class="m-code"><span class="c1">// create a dependent async task that returns 100</span>
+<span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span><span class="w"></span></pre><p>Regardless of <code>t1</code> runs before or after <code>t2</code>, the resulting topological order is always correct with the graph definition, either <code>ABC</code> or <code>ACB</code>.</p></section><section id="QueryTheComppletionStatusOfDependentAsyncTasks"><h2><a href="#QueryTheComppletionStatusOfDependentAsyncTasks">Query the Completion Status of Dependent Async Tasks</a></h2><p>When you create a dependent async task, you can query its completion status by <a href="classtf_1_1AsyncTask.html#aefeefa30d7cafdfbb7dc8def542e8e51" class="m-doc">tf::<wbr />AsyncTask::<wbr />is_done</a>, which returns <code>true</code> upon completion or <code>false</code> otherwise. A completed dependent async task indicates that a worker has executed its associated callable.</p><pre class="m-code"><span class="c1">// create a dependent async task that returns 100</span>
 <span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">task</span><span class="p">,</span><span class="w"> </span><span class="n">fu</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">executor</span><span class="p">.</span><span class="n">dependent_async</span><span class="p">([](){</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w"> </span><span class="p">});</span><span class="w"></span>
 
 <span class="c1">// loops until the dependent async task completes</span>
@@ -167,7 +167,7 @@ <h3>Contents</h3>
 <span class="p">};</span><span class="w"></span>
 
 <span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">task</span><span class="p">,</span><span class="w"> </span><span class="n">fib11</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">executor</span><span class="p">.</span><span class="n">dependent_async</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="n">fibonacci</span><span class="p">,</span><span class="w"> </span><span class="mi">11</span><span class="p">));</span><span class="w"></span>
-<span class="n">assert</span><span class="p">(</span><span class="n">fib11</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">89</span><span class="p">);</span><span class="w"> </span><span class="c1">// the 11-th Fibonacci number is 89</span></pre><p>{.cpp}</p></section>
+<span class="n">assert</span><span class="p">(</span><span class="n">fib11</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">89</span><span class="p">);</span><span class="w"> </span><span class="c1">// the 11-th Fibonacci number is 89</span></pre></section>
  </div>
  </div>
  </div>

diff --git a/docs/classtf_1_1FlowBuilder.html b/docs/classtf_1_1FlowBuilder.html
@@ -190,6 +190,17 @@ <h2><a href="#pub-methods">Public functions</a></h2>
  P&amp;&amp; part = P()) -&gt; <a href="classtf_1_1Task.html" class="m-doc">Task</a></span>
  </dt>
  <dd>constructs an STL-styled parallel transform-reduce task</dd>
+ <dt>
+ <div class="m-doc-template">template&lt;typename B1, typename E1, typename B2, typename T, typename BOP_R, typename BOP_T, typename P = <a href="classtf_1_1GuidedPartitioner.html" class="m-doc">GuidedPartitioner</a>&gt;</div>
+ <span class="m-doc-wrap-bumper">auto <a href="#a7099ef62158a6e0770bc8ceef1961326" class="m-doc">transform_reduce</a>(</span><span class="m-doc-wrap">B1 first1,
+ E1 last1,
+ B2 first2,
+ T&amp; init,
+ BOP_R bop_r,
+ BOP_T bop_t,
+ P&amp;&amp; part = P()) -&gt; <a href="classtf_1_1Task.html" class="m-doc">Task</a></span>
+ </dt>
+ <dd>constructs an STL-styled parallel transform-reduce task</dd>
  <dt>
  <div class="m-doc-template">template&lt;typename B, typename E, typename D, typename BOP&gt;</div>
  <span class="m-doc-wrap-bumper">auto <a href="#a1c2ace9290d83c2a006614a4d66ad588" class="m-doc">inclusive_scan</a>(</span><span class="m-doc-wrap">B first,
@@ -1129,6 +1140,98 @@ <h3>
  </table>
 <p>The task spawns asynchronous tasks to perform parallel reduction over <code>init</code> and the transformed elements in the range <code>[first, last)</code>. The reduced result is store in <code>init</code>. This method is equivalent to the parallel execution of the following loop:</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">itr</span><span class="o">=</span><span class="n">first</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="o">!=</span><span class="n">last</span><span class="p">;</span><span class="w"> </span><span class="n">itr</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
 <span class="w"> </span><span class="n">init</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bop</span><span class="p">(</span><span class="n">init</span><span class="p">,</span><span class="w"> </span><span class="n">uop</span><span class="p">(</span><span class="o">*</span><span class="n">itr</span><span class="p">));</span><span class="w"></span>
+<span class="p">}</span><span class="w"></span></pre><p>Iterators are templated to enable stateful range using <a href="http://en.cppreference.com/w/cpp/utility/functional/reference_wrapper.html" class="m-doc-external">std::<wbr />reference_wrapper</a>.</p><p>Please refer to <a href="ParallelReduction.html" class="m-doc">Parallel Reduction</a> for details.</p>
+ </div></section>
+ <section class="m-doc-details" id="a7099ef62158a6e0770bc8ceef1961326"><div>
+ <h3>
+ <div class="m-doc-template">
+ template&lt;typename B1, typename E1, typename B2, typename T, typename BOP_R, typename BOP_T, typename P = <a href="classtf_1_1GuidedPartitioner.html" class="m-doc">GuidedPartitioner</a>&gt;
+ </div>
+ <span class="m-doc-wrap-bumper"><a href="classtf_1_1Task.html" class="m-doc">Task</a> tf::<wbr />FlowBuilder::<wbr /></span><span class="m-doc-wrap"><span class="m-doc-wrap-bumper"><a href="#a7099ef62158a6e0770bc8ceef1961326" class="m-doc-self">transform_reduce</a>(</span><span class="m-doc-wrap">B1 first1,
+ E1 last1,
+ B2 first2,
+ T&amp; init,
+ BOP_R bop_r,
+ BOP_T bop_t,
+ P&amp;&amp; part = P())</span></span>
+ </h3>
+ <p>constructs an STL-styled parallel transform-reduce task</p>
+ <table class="m-table m-fullwidth m-flat">
+ <thead>
+ <tr><th colspan="2">Template parameters</th></tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td style="width: 1%">B1</td>
+ <td>first beginning iterator type</td>
+ </tr>
+ <tr>
+ <td>E1</td>
+ <td>first ending iterator type</td>
+ </tr>
+ <tr>
+ <td>B2</td>
+ <td>second beginning iterator type</td>
+ </tr>
+ <tr>
+ <td>T</td>
+ <td>result type</td>
+ </tr>
+ <tr>
+ <td>BOP_R</td>
+ <td>binary reducer type</td>
+ </tr>
+ <tr>
+ <td>BOP_T</td>
+ <td>binary transformion type</td>
+ </tr>
+ <tr>
+ <td>P</td>
+ <td>partitioner type (default <a href="classtf_1_1GuidedPartitioner.html" class="m-doc">tf::<wbr />GuidedPartitioner</a>)</td>
+ </tr>
+ </tbody>
+ <thead>
+ <tr><th colspan="2">Parameters</th></tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>first1</td>
+ <td></td>
+ </tr>
+ <tr>
+ <td>last1</td>
+ <td></td>
+ </tr>
+ <tr>
+ <td>first2</td>
+ <td></td>
+ </tr>
+ <tr>
+ <td>init</td>
+ <td>initial value of the reduction and the storage for the reduced result</td>
+ </tr>
+ <tr>
+ <td>bop_r</td>
+ <td>binary operator that will be applied in unspecified order to the results of <code>bop_t</code></td>
+ </tr>
+ <tr>
+ <td>bop_t</td>
+ <td>binary operator that will be applied to transform each element in the range to the result type</td>
+ </tr>
+ <tr>
+ <td>part</td>
+ <td>partitioning algorithm to schedule parallel iterations</td>
+ </tr>
+ </tbody>
+ <tfoot>
+ <tr>
+ <th>Returns</th>
+ <td>a <a href="classtf_1_1Task.html" class="m-doc">tf::<wbr />Task</a> handle</td>
+ </tr>
+ </tfoot>
+ </table>
+<p>The task spawns asynchronous tasks to perform parallel reduction over <code>init</code> and the transformed elements in the range <code>[first, last)</code>. The reduced result is store in <code>init</code>. This method is equivalent to the parallel execution of the following loop:</p><pre class="m-code"><span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">itr1</span><span class="o">=</span><span class="n">first1</span><span class="p">,</span><span class="w"> </span><span class="n">itr2</span><span class="o">=</span><span class="n">first2</span><span class="p">;</span><span class="w"> </span><span class="n">itr1</span><span class="o">!=</span><span class="n">last1</span><span class="p">;</span><span class="w"> </span><span class="n">itr1</span><span class="o">++</span><span class="p">,</span><span class="w"> </span><span class="n">itr2</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
+<span class="w"> </span><span class="n">init</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bop_r</span><span class="p">(</span><span class="n">init</span><span class="p">,</span><span class="w"> </span><span class="n">bop_t</span><span class="p">(</span><span class="o">*</span><span class="n">itr1</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">itr2</span><span class="p">));</span><span class="w"></span>
 <span class="p">}</span><span class="w"></span></pre><p>Iterators are templated to enable stateful range using <a href="http://en.cppreference.com/w/cpp/utility/functional/reference_wrapper.html" class="m-doc-external">std::<wbr />reference_wrapper</a>.</p><p>Please refer to <a href="ParallelReduction.html" class="m-doc">Parallel Reduction</a> for details.</p>
  </div></section>
  <section class="m-doc-details" id="a1c2ace9290d83c2a006614a4d66ad588"><div>