From 6ed012b8023117845e100bccbf1dbd43ca265c84 Mon Sep 17 00:00:00 2001 From: Ravenwater Date: Sun, 5 Jan 2025 14:55:44 +0000 Subject: [PATCH] deploy: 2c11bfc485f6e0a8a4a9c0a27570da442027cdde --- 404.html | 2 +- blas/index.html | 8 ++--- blas/level1/index.html | 8 ++--- blas/level2/index.html | 8 ++--- blas/level3/index.html | 8 ++--- categories/analyzing/index.html | 6 ++-- categories/conditioning/index.html | 6 ++-- categories/design/index.html | 6 ++-- categories/domain-flow/index.html | 6 ++-- categories/dsp/index.html | 6 ++-- categories/filtering/index.html | 6 ++-- categories/identification/index.html | 6 ++-- categories/index.html | 6 ++-- categories/introduction/index.html | 6 ++-- categories/matrix-math/index.html | 6 ++-- categories/schedule/index.html | 6 ++-- categories/spacetime/index.html | 6 ++-- categories/transforming/index.html | 6 ++-- contentdev/index.html | 8 ++--- contentdev/prototype/index.html | 8 ++--- design/currentstate/index.html | 8 ++--- design/dfa/index.html | 8 ++--- design/dfm/index.html | 8 ++--- design/elements/index.html | 8 ++--- design/index.html | 8 ++--- design/nextsteps/index.html | 8 ++--- design/space/index.html | 6 ++-- design/time/index.html | 31 ++++++++++++------- dsp/conditioning/index.html | 6 ++-- dsp/filters/index.html | 6 ++-- dsp/identification/index.html | 8 ++--- dsp/index.html | 8 ++--- dsp/spectral/index.html | 6 ++-- dsp/transforms/index.html | 6 ++-- factorization/factorization/index.html | 6 ++-- factorization/index.html | 8 ++--- .../computational-spacetime/index.html | 8 ++--- introduction/derivation/index.html | 8 ++--- introduction/domain-flow/index.html | 8 ++--- introduction/example/index.html | 8 ++--- introduction/freeschedule/index.html | 8 ++--- introduction/index.html | 8 ++--- introduction/linearschedule/index.html | 8 ++--- introduction/nextsteps/index.html | 8 ++--- introduction/parallel-programming/index.html | 8 ++--- introduction/spacetime/index.html | 8 ++--- introduction/wavefront/index.html | 8 ++--- linearsolvers/index.html | 6 ++-- linearsolvers/lu/index.html | 8 ++--- linearsolvers/solvers/index.html | 6 ++-- matrixkernels/index.html | 8 ++--- matrixkernels/matrixkernels/index.html | 8 ++--- search/index.html | 8 ++--- tags/algorithm/index.html | 6 ++-- tags/computational-spacetime/index.html | 6 ++-- tags/conditioning/index.html | 6 ++-- tags/derivation/index.html | 6 ++-- tags/domain-flow/index.html | 6 ++-- tags/dsp/index.html | 6 ++-- tags/filtering/index.html | 6 ++-- tags/free-schedule/index.html | 6 ++-- tags/identification/index.html | 6 ++-- tags/index-space/index.html | 6 ++-- tags/index.html | 6 ++-- tags/lattice/index.html | 6 ++-- tags/linear-schedule/index.html | 6 ++-- tags/matrix-multiply/index.html | 6 ++-- tags/spectral-analysis/index.html | 6 ++-- tags/transform/index.html | 6 ++-- 69 files changed, 252 insertions(+), 243 deletions(-) diff --git a/404.html b/404.html index eac4623..0fff2f1 100644 --- a/404.html +++ b/404.html @@ -1,2 +1,2 @@ 404 Page not found - Domain Flow Architecture -

44

Not found

Whoops. Looks like this page doesn't exist ¯\_(ツ)_/¯.

Go to homepage

\ No newline at end of file +

44

Not found

Whoops. Looks like this page doesn't exist ¯\_(ツ)_/¯.

Go to homepage

\ No newline at end of file diff --git a/blas/index.html b/blas/index.html index 90e8831..9411492 100644 --- a/blas/index.html +++ b/blas/index.html @@ -7,16 +7,16 @@ components in computational methods, the investment can pay high dividends.">Basic Linear Algebra - Domain Flow Architecture -

Basic Linear Algebra

Basic Linear Algebra Subroutines are an historically significant set of +

Basic Linear Algebra

Basic Linear Algebra Subroutines are an historically significant set of functions that encapsulate the basic building blocks of a large collection of linear algebra algorithms and implementation.

The BLAS library has proven to be a very productive mechanism to create and disseminate highly optimized numerical libraries to a plethora of computer architectures and machines. Writing high-performance linear -algebra algorithms turns out to be a tenacious problem, but since linear algebra operations are essential
components in computational methods, the investment can pay high dividends.

\ No newline at end of file + 
\ No newline at end of file diff --git a/blas/level1/index.html b/blas/level1/index.html index ec302d9..863c989 100644 --- a/blas/level1/index.html +++ b/blas/level1/index.html @@ -7,7 +7,7 @@ vector scale: scalar-vector multiplication: $z = \alpha x \implies (z_i = \alpha x_i)$ vector element addition: $z = x + y \implies (z_i = x_i + y_i)$ vector element multiply: $z = x * y \implies (z_i = x_i * y_i)$ vector dot product: $c = x^Ty \implies ( c = \sum_{i = 1}^n x_i y_i ) $, aka inner-product saxpy, or scalar alpha x plus y, $z = \alpha x + y \implies z_i = \alpha x_i + y_i $ The fifth operator, while technically redundant, makes the expressions of linear algebra algorithms more concise.">BLAS Level 1 - Domain Flow Architecture -

BLAS Level 1

BLAS Level 1 are $\mathcal{O}(N)$ class operators. This makes these operators operand access limited +

BLAS Level 1

BLAS Level 1 are $\mathcal{O}(N)$ class operators. This makes these operators operand access limited and thus require careful distribution in a parallel environment.

There are four basic vector operations, and a fifth convenience operators. Let $ \alpha \in \Bbb{R}, x \in \Bbb{R^n}, y \in \Bbb{R^n}, z \in \Bbb{R^n}$$ then:

  1. vector scale: scalar-vector multiplication: $z = \alpha x \implies (z_i = \alpha x_i)$
  2. vector element addition: $z = x + y \implies (z_i = x_i + y_i)$
  3. vector element multiply: $z = x * y \implies (z_i = x_i * y_i)$
  4. vector dot product: $c = x^Ty \implies ( c = \sum_{i = 1}^n x_i y_i ) $, aka inner-product
  5. saxpy, or scalar alpha x plus y, $z = \alpha x + y \implies z_i = \alpha x_i + y_i $

The fifth operator, while technically redundant, makes the expressions of linear algebra algorithms more concise.

One class of domain flow programs for these operators assumes a linear distribution of the vectors, @@ -51,12 +51,12 @@ z: alpha[i-1,j,k] * x[i,j-1,k] + y[i,j,k-1] }

The final scalar alpha x plus y, or saxpy operator is combining the vector scale and vector addition operators, and will show the same constraint as the vector scale -operator due to the required propagation broadcast of the scaling factor.

\ No newline at end of file diff --git a/blas/level2/index.html b/blas/level2/index.html index 2391a30..c865274 100644 --- a/blas/level2/index.html +++ b/blas/level2/index.html @@ -3,7 +3,7 @@ Let $A \in \Bbb{R^{mxn}}$, the matrix-vector product is defined as: $$z = Ax, \space where \space x \in \Bbb{R^n}$$">BLAS Level 2 - Domain Flow Architecture -

BLAS Level 2

BLAS Level 2 are $\mathcal{O}(N^2)$ class operators, still operand access +

BLAS Level 2

BLAS Level 2 are $\mathcal{O}(N^2)$ class operators, still operand access limited as we need to fetch multiple operands per operation without any reuse. The core operator is the matrix-vector multiplication in all its different forms specialized for matrix shape — triangular, banded, symmetric —, and matrix type — integer, real, complex, @@ -15,12 +15,12 @@ x: x[i,j-1,k] z: a[i,j,k-1] * x[i,j-1,k] }

Banded, symmetric, and triangular versions simply alter the constraint set of the domains of -computation: the fundamental dependencies do not change.

\ No newline at end of file diff --git a/blas/level3/index.html b/blas/level3/index.html index 2e250f3..4cbc6e2 100644 --- a/blas/level3/index.html +++ b/blas/level3/index.html @@ -11,7 +11,7 @@ In addition to matrix-matrix multiply there are the Rank-k update operators, which are outer-products and matrix additions. Here is a Hermitian Rank-k update: $$ C = \alpha A A^T + \beta C, \space where \space C \space is \space Hermitian. $$ A Hermitian matrix is defined as a matrix that is equal to its Hermitian conjugate. In other words, the matrix $C$ is Hermitian if and only if $C = C^H$. Obviously a Hermitian matrix must be square. Hermitian matrices can be understood as the complex extension of real symmetric matrices.">BLAS Level 3 - Domain Flow Architecture -

BLAS Level 3

BLAS Level 3 are $\mathcal{O}(N^3)$ operators, and finally compute bound +

BLAS Level 3

BLAS Level 3 are $\mathcal{O}(N^3)$ operators, and finally compute bound creating many opportunities to optimize operand reuse.

In addition to matrix-matrix multiply there are the Rank-k update operators, which are outer-products and matrix additions.

Here is a Hermitian Rank-k update:

$$ C = \alpha A A^T + \beta C, \space where \space C \space is \space Hermitian. $$

A Hermitian matrix is defined as a matrix that is equal to its Hermitian conjugate. In other words, the matrix $C$ is Hermitian if and only if $C = C^H$. Obviously a Hermitian @@ -32,12 +32,12 @@ c: c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k] } } -

Here we introduce a conditional constraint that impacts the domain of computation for a set of equations.

\ No newline at end of file diff --git a/categories/analyzing/index.html b/categories/analyzing/index.html index f741c54..01240fe 100644 --- a/categories/analyzing/index.html +++ b/categories/analyzing/index.html @@ -1,10 +1,10 @@ Analyzing - Category - Domain Flow Architecture -

Category - Analyzing

S

\ No newline at end of file diff --git a/categories/conditioning/index.html b/categories/conditioning/index.html index eb4443c..fb648d0 100644 --- a/categories/conditioning/index.html +++ b/categories/conditioning/index.html @@ -1,10 +1,10 @@ Conditioning - Category - Domain Flow Architecture -

Category - Conditioning

S

\ No newline at end of file diff --git a/categories/design/index.html b/categories/design/index.html index befb135..3eb5530 100644 --- a/categories/design/index.html +++ b/categories/design/index.html @@ -1,10 +1,10 @@ Design - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/domain-flow/index.html b/categories/domain-flow/index.html index e830f87..10827b0 100644 --- a/categories/domain-flow/index.html +++ b/categories/domain-flow/index.html @@ -1,10 +1,10 @@ Domain-Flow - Category - Domain Flow Architecture -

Category - Domain-Flow

A

  • An Example

C

D

F

L

P

\ No newline at end of file diff --git a/categories/dsp/index.html b/categories/dsp/index.html index 80186f0..fbe9604 100644 --- a/categories/dsp/index.html +++ b/categories/dsp/index.html @@ -1,10 +1,10 @@ Dsp - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/filtering/index.html b/categories/filtering/index.html index 2e4fc01..7f59cbc 100644 --- a/categories/filtering/index.html +++ b/categories/filtering/index.html @@ -1,10 +1,10 @@ Filtering - Category - Domain Flow Architecture -

Category - Filtering

D

\ No newline at end of file diff --git a/categories/identification/index.html b/categories/identification/index.html index 19900b6..0fb0f06 100644 --- a/categories/identification/index.html +++ b/categories/identification/index.html @@ -1,10 +1,10 @@ Identification - Category - Domain Flow Architecture -

Category - Identification

I

\ No newline at end of file diff --git a/categories/index.html b/categories/index.html index 35df1bc..14d82bd 100644 --- a/categories/index.html +++ b/categories/index.html @@ -1,10 +1,10 @@ Categories - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/introduction/index.html b/categories/introduction/index.html index 6f96fe5..2770287 100644 --- a/categories/introduction/index.html +++ b/categories/introduction/index.html @@ -1,10 +1,10 @@ Introduction - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/matrix-math/index.html b/categories/matrix-math/index.html index 9c0030b..781539d 100644 --- a/categories/matrix-math/index.html +++ b/categories/matrix-math/index.html @@ -1,10 +1,10 @@ Matrix-Math - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/schedule/index.html b/categories/schedule/index.html index e1dbfa5..7ea787b 100644 --- a/categories/schedule/index.html +++ b/categories/schedule/index.html @@ -1,10 +1,10 @@ Schedule - Category - Domain Flow Architecture -

Category - Schedule

F

L

\ No newline at end of file diff --git a/categories/spacetime/index.html b/categories/spacetime/index.html index 4870f35..b3b195d 100644 --- a/categories/spacetime/index.html +++ b/categories/spacetime/index.html @@ -1,10 +1,10 @@ Spacetime - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/transforming/index.html b/categories/transforming/index.html index a8d2b29..7bb52bf 100644 --- a/categories/transforming/index.html +++ b/categories/transforming/index.html @@ -1,10 +1,10 @@ Transforming - Category - Domain Flow Architecture -

Category - Transforming

T

  • Transforms
\ No newline at end of file diff --git a/contentdev/index.html b/contentdev/index.html index c6da4f8..ccf13b4 100644 --- a/contentdev/index.html +++ b/contentdev/index.html @@ -1,11 +1,11 @@ Content Development - Domain Flow Architecture -

Content Development

The following pages are examples for content developers to quickly add interactive -content that aids in understanding parallel algorithm design and optimization.

\ No newline at end of file diff --git a/contentdev/prototype/index.html b/contentdev/prototype/index.html index 49d8b03..133cd9e 100644 --- a/contentdev/prototype/index.html +++ b/contentdev/prototype/index.html @@ -3,7 +3,7 @@ All you need is a tag with an id and some CSS styling and a call into an animation program that fills that canvas.">Prototype - Domain Flow Architecture -

Prototype

Prototype

This is a basic skeleton of a Hugo Markdown page that includes a three.js animation.

All you need is a <canvas> tag with an id and some CSS styling and a call into -an animation program that fills that canvas.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/currentstate/index.html b/design/currentstate/index.html index 8204d93..ade2ed3 100644 --- a/design/currentstate/index.html +++ b/design/currentstate/index.html @@ -1,5 +1,5 @@ Computational Dynamics - Domain Flow Architecture -

Computational Dynamics

A memory access in a physical machine can be very complex. For example, +

Computational Dynamics

A memory access in a physical machine can be very complex. For example, when a program accesses an operand located at an address that is not in physical memory, the processor registers a page miss. The performance difference between an access from the local L1 cache versus a page miss @@ -33,12 +33,12 @@ modulation due to power constraints, causes the collective to wait for the slowest process. As the number of processors grows, so does variability. And unfortunately, when variability rises processor -utilization drops and algorithmic performance suffers.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/dfa/index.html b/design/dfa/index.html index 421c05e..e26aa28 100644 --- a/design/dfa/index.html +++ b/design/dfa/index.html @@ -1,5 +1,5 @@ Domain Flow Architecture -

Domain Flow Architecture

Domain Flow Architecture (DFA) machines are the class of machines that execute +

Domain Flow Architecture

Domain Flow Architecture (DFA) machines are the class of machines that execute using the domain flow execution model. The fundamental problem limiting the energy efficiency of the data flow machine is the size of the CAM and fabric. As they are managed as two separate clusters of resources, @@ -15,12 +15,12 @@ exhibit partial orders that are regular and are separated in space. That is a mouthful, but we can make this more tangible when we discuss in more detail the temporal behavior of a domain flow program in the -next section about time.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/dfm/index.html b/design/dfm/index.html index e14301e..5ae318d 100644 --- a/design/dfm/index.html +++ b/design/dfm/index.html @@ -3,7 +3,7 @@ write an operand into an appropriate operand slot in an instruction token stored in a Content Addressable Memory (CAM) by an instruction tag check if all operands are present to start the execution cycle of the instruction if an instruction is ready then extract it from the CAM and inject it into a fabric of computational elements deliver the instruction to an available execution unit execute the instruction, and finally write the result back into an operand slot in target instruction token stored in the CAM The strength of the resource contention management of the Data Flow Machine is that the machine can execute along the free schedule, that is, the inherent parallelism of the algorithm. Any physical implementation, however, is constrained by the energy-efficiency of the CAM and the network that connects the CAM to the fabric of computational elements. As concurrency demands grow the efficiency of both the CAM and the fabric decreases making large data flow machines unattractive. However, small data flow machines don’t have this problem and are able to deliver energy-efficient, low-latency resource management. Today, all high-performance microprocessors have a data flow machine at their core.">Data Flow Machine - Domain Flow Architecture -

Data Flow Machine

In the late 60’s and 70’s when computer scientists were exploring parallel +

Data Flow Machine

In the late 60’s and 70’s when computer scientists were exploring parallel computation by building the first parallel machines and developing the parallel algorithm complexity theory, folk realized that this over-constrained specification was a real problem for concurrency. @@ -18,12 +18,12 @@ decreases making large data flow machines unattractive. However, small data flow machines don’t have this problem and are able to deliver energy-efficient, low-latency resource management. Today, all high-performance microprocessors -have a data flow machine at their core.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/elements/index.html b/design/elements/index.html index 620e88f..f1fb2e9 100644 --- a/design/elements/index.html +++ b/design/elements/index.html @@ -15,7 +15,7 @@ Item #2 is well-known among high-performance algorithm designers. Item #3 is well-known among hardware designers and computer engineers. When designing domain flow algorithms, we are looking for an energy efficient embedding of a computational graph in space, and it is thus to be expected that we need to combine all three attributes of minimizing operator count, operand movement, and resource contention. The complexity of minimizing resource contention is what makes hardware design so much more complex. But the complexity of operator contention can be mitigated by clever resource contention management.">Elements of Design - Domain Flow Architecture -

Elements of Design

We can summarize the attributes of good parallel algorithm design as

  1. low operation count, where operation count is defined as the sum of operators and operand accesses
  2. minimal operand movement
  3. minimal resource contention

Item #1 is well-known by theoretical computer scientists.

Item #2 is well-known among high-performance algorithm designers.

Item #3 is well-known among hardware designers and computer engineers.

When designing domain flow algorithms, we are looking for an energy +

Elements of Design

We can summarize the attributes of good parallel algorithm design as

  1. low operation count, where operation count is defined as the sum of operators and operand accesses
  2. minimal operand movement
  3. minimal resource contention

Item #1 is well-known by theoretical computer scientists.

Item #2 is well-known among high-performance algorithm designers.

Item #3 is well-known among hardware designers and computer engineers.

When designing domain flow algorithms, we are looking for an energy efficient embedding of a computational graph in space, and it is thus to be expected that we need to combine all three attributes of minimizing operator count, operand movement, and resource contention. @@ -31,12 +31,12 @@ it forces a total order on the computation graph. This tasks of creating the total order falls on the algorithm designer.

For parallel execution we need a resource contention management mechanism that is more efficient. And this is where our -computational spacetime will come in handy.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/index.html b/design/index.html index a78823e..b18f27b 100644 --- a/design/index.html +++ b/design/index.html @@ -1,17 +1,17 @@ Elements of Good Design - Domain Flow Architecture -

Elements of Good Design

The best algorithms for sequential execution are those that minimize the number +

Elements of Good Design

The best algorithms for sequential execution are those that minimize the number of operations to yield results. Computational complexity theory has aided this quest, but any performance-minded algorithm designer knows that the best theoretical algorithms are not necessarily the fastest when executed on real hardware. The difference is typically caused by the trade-off sequential algorithms have to make between computation and accessing memory. The constraints of data movement are even more pronounced in parallel algorithms as demonstrated in the previous section.

This chapter explores the elements of good design for parallel algorithms and their -execution on real hardware.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/nextsteps/index.html b/design/nextsteps/index.html index d4f27ff..f8eb786 100644 --- a/design/nextsteps/index.html +++ b/design/nextsteps/index.html @@ -3,19 +3,19 @@ Once we get a good collection of fast, and energy efficient algorithms together, we can start to explore how best to engineer combinations of these operators. We will discover that sometimes, the cost of an information exchange makes a whole class of algorithms unattractive for parallel executions. With that insight comes the need to create new algorithms and sometimes completely new mathematical approaches to properly leverage the available resources.">Next Steps - Domain Flow Architecture -

Next Steps

In this short introduction to parallel algorithms in general and domain flow +

Next Steps

In this short introduction to parallel algorithms in general and domain flow in particular, our next step is to look at specific algorithms, and explore their optimal parallel execution dynamics.

Once we get a good collection of fast, and energy efficient algorithms together, we can start to explore how best to engineer combinations of these operators. We will discover that sometimes, the cost of an information exchange makes a whole class of algorithms unattractive for parallel executions. With that insight comes the need to create new algorithms and sometimes completely new -mathematical approaches to properly leverage the available resources.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/space/index.html b/design/space/index.html index 600d702..ea6b5c8 100644 --- a/design/space/index.html +++ b/design/space/index.html @@ -1,10 +1,10 @@ Space: the where - Domain Flow Architecture -

Space: the where

Space

\ No newline at end of file diff --git a/design/time/index.html b/design/time/index.html index a2910fa..aed4b66 100644 --- a/design/time/index.html +++ b/design/time/index.html @@ -2,8 +2,8 @@ Let x be a computation that uses y as input, then the free schedule is defined as: \begin{equation} T(x) =\begin{cases} 1, & \text{if y is an external input}\\ 1 + max(T(y)) & \text{otherwise} \end{cases} \end{equation} The free schedule is defined at the level of the individual operations. It does not provide any information about the global data movement or the global structure of the interactions between data and operation. Moreover, the above equation describes a logical sequencing of operations, it does not specify a physical evolution.">Time: the when - Domain Flow Architecture -

Time: the when

The free schedule represents the inherent concurrency of the parallel algorithm, and, under the assumption +Let x be a computation that uses y as input, then the free schedule is defined as: \begin{equation} T(x) =\begin{cases} 1, & \text{if y is an external input}\\ 1 + max(T(y)) & \text{otherwise} \end{cases} \end{equation} The free schedule is defined at the level of the individual operations. It does not provide any information about the global data movement or the global structure of the interactions between data and operation. Moreover, the above equation describes a logical sequencing of operations, it does not specify a physical evolution.">Time: the when - Domain Flow Architecture +

Time: the when

The free schedule represents the inherent concurrency of the parallel algorithm, and, under the assumption of infinite resources, it is the fastest schedule possible.

Let x be a computation that uses y as input, then the free schedule is defined as: \begin{equation} T(x) =\begin{cases} @@ -28,23 +28,23 @@ of algorithms with a simpler order called a linear schedule. This provides a concise description of the global activity of a single variable uniform recurrence equation.

We can extend that framework to work with systems of uniform recurrence -equations. And we can bring the class of algorithms described by systems -of affine recurrence equations into this new framework by transforming +equations (SURE). And we can bring the class of algorithms described by systems +of affine recurrence equations into this SURE framework by transforming the affine transformations into propagations. The affine maps typically represent gathers (reductions) and scatters (broadcasts), and these affine transformations can be made uniform by using uniform dependencies to propagate the values through the index space.

Whereas it is sufficient to solve a single linear program to determine if a single variable uniform recurrence is explicitly defined, the procedure to test computability of a system of equations requires an -iterative decomposition of the dependence graph into strong connected +iterative decomposition of the dependence graph into strongly connected components.

A directed graph is called strongly connected if any two distinct vertices lie in a common cycle. A strongly connected component of a -directed graph is a strong connected subgraph not properly contained +directed graph is a strongly connected subgraph not properly contained in any other strongly connected subgraph. A directed graph may contain several strong components, and each vertex lies in exactly one strongly -connected component.

We can decompose the graph representing the system of recurrence +connected component.

We can decompose the graph representing the system of uniform recurrence equations in a hierarchy of strongly connected components as follows: -create a roo node of the tree containing the orginal dependence graph. +create a root node of the tree containing the orginal dependence graph. Determine the strongly connected components of G and create a child for the root for each strong component. Then apply a zero-weight cycle search procedure on each of the strongly connected components and @@ -57,12 +57,21 @@ parallelism of the algorithm. Karp, Miller, and Winograd provide a proof that bounds the free schedule for each of the subgraphs that reside in the nodes of the decomposition tree. They use -this bound to quantify the amount of parallelism.

\ No newline at end of file + 
\ No newline at end of file diff --git a/dsp/conditioning/index.html b/dsp/conditioning/index.html index 25aa9ce..99751ac 100644 --- a/dsp/conditioning/index.html +++ b/dsp/conditioning/index.html @@ -1,10 +1,10 @@ Signal Conditioning - Domain Flow Architecture -
\ No newline at end of file diff --git a/dsp/filters/index.html b/dsp/filters/index.html index f06d4ab..77f4d23 100644 --- a/dsp/filters/index.html +++ b/dsp/filters/index.html @@ -1,10 +1,10 @@ Digital Filtering - Domain Flow Architecture -
\ No newline at end of file diff --git a/dsp/identification/index.html b/dsp/identification/index.html index d751b02..1bc7510 100644 --- a/dsp/identification/index.html +++ b/dsp/identification/index.html @@ -3,7 +3,7 @@ When there are signals and noises, physicists try to identify signals by modeling them, whereas statisticians oppositely try to model noise to identify signals. In this study, we applied the statisticians’ concept of signal detection of physics data with small-size samples and high dimensions without modeling the signals. Most of the data in nature, whether noises or signals, are assumed to be generated by dynamical systems; thus, there is essentially no distinction between these generating processes. We propose that the correlation length of a dynamical system and the number of samples are crucial for the practical definition of noise variables among the signal variables generated by such a system. Since variables with short-term correlations reach normal distributions faster as the number of samples decreases, they are regarded to be noise-like variables, whereas variables with opposite properties are signal-like variables. Normality tests are not effective for data of small-size samples with high dimensions. Therefore, we modeled noises on the basis of the property of a noise variable, that is, the uniformity of the histogram of the probability that a variable is a noise. We devised a method of detecting signal variables from the structural change of the histogram according to the decrease in the number of samples. We applied our method to the data generated by globally coupled map, which can produce time series data with different correlation lengths, and also applied to gene expression data, which are typical static data of small-size samples with high dimensions, and we successfully detected signal variables from them.">Identification - Domain Flow Architecture -

Identification

Identification is the act of recognizing the signal in the presence of noise.

When there are signals and noises, physicists try to identify signals by modeling them, +

Identification

Identification is the act of recognizing the signal in the presence of noise.

When there are signals and noises, physicists try to identify signals by modeling them, whereas statisticians oppositely try to model noise to identify signals. In this study, we applied the statisticians’ concept of signal detection of physics data with small-size samples and high dimensions without modeling the signals. Most of the data in nature, @@ -22,12 +22,12 @@ to the data generated by globally coupled map, which can produce time series data with different correlation lengths, and also applied to gene expression data, which are typical static data of small-size samples with high dimensions, and we successfully -detected signal variables from them.

\ No newline at end of file + 
\ No newline at end of file diff --git a/dsp/index.html b/dsp/index.html index dc7be25..1867b1a 100644 --- a/dsp/index.html +++ b/dsp/index.html @@ -1,13 +1,13 @@ Digital Signal Processing - Domain Flow Architecture -

Digital Signal Processing

Digital Signal Processing is the discrete realization of Analog Signal Processing +

Digital Signal Processing

Digital Signal Processing is the discrete realization of Analog Signal Processing operations used to condition, amplify, characterize, and transform. Digital Signal Processing is essential when interfacing a digital computer -to a physical process to enable reproducible and high-fidelity applications.

\ No newline at end of file + 
\ No newline at end of file diff --git a/dsp/spectral/index.html b/dsp/spectral/index.html index 583c94b..453dab0 100644 --- a/dsp/spectral/index.html +++ b/dsp/spectral/index.html @@ -1,10 +1,10 @@ Spectral Analysis - Domain Flow Architecture -
\ No newline at end of file diff --git a/dsp/transforms/index.html b/dsp/transforms/index.html index 2ebf81e..741953f 100644 --- a/dsp/transforms/index.html +++ b/dsp/transforms/index.html @@ -1,10 +1,10 @@ Transforms - Domain Flow Architecture -
\ No newline at end of file diff --git a/factorization/factorization/index.html b/factorization/factorization/index.html index 0cd0f8a..1f5f145 100644 --- a/factorization/factorization/index.html +++ b/factorization/factorization/index.html @@ -3,12 +3,12 @@ $$ x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$">Matrix Factorizations - Domain Flow Architecture -

Matrix Factorizations

This is the quadratic equation:

$$ x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$
\ No newline at end of file diff --git a/factorization/index.html b/factorization/index.html index e2d89cb..83f1023 100644 --- a/factorization/index.html +++ b/factorization/index.html @@ -1,12 +1,12 @@ Matrix Factorization - Domain Flow Architecture -

Matrix Factorization

Matrix factorizations are the work horse of linear algebra applications. +

Matrix Factorization

Matrix factorizations are the work horse of linear algebra applications. Factorizations create equivalences that improve the usability or robustness -of an algorithm.

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/computational-spacetime/index.html b/introduction/computational-spacetime/index.html index 604221d..59b0855 100644 --- a/introduction/computational-spacetime/index.html +++ b/introduction/computational-spacetime/index.html @@ -1,5 +1,5 @@ Computational Spacetime - Domain Flow Architecture -

Computational Spacetime

Computational Spacetime

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/derivation/index.html b/introduction/derivation/index.html index 51315e0..cfd27c6 100644 --- a/introduction/derivation/index.html +++ b/introduction/derivation/index.html @@ -3,7 +3,7 @@ The Linear Algebra universe is particularly rich in partial orders, something that has been exploited for centuries 1. Matrix Computations2 by Golub, and van Loan provide a comprehensive review. What follows may be a bit technical, but keep in mind the visualizations of the previous pages as you try to visualize what the math implies.">Derivation of the matrix multiply domain flow program - Domain Flow Architecture -

Derivation of the matrix multiply domain flow program

The concepts of partial and total orders are essential for finding optimal domain flow algorithms. +

Derivation of the matrix multiply domain flow program

The concepts of partial and total orders are essential for finding optimal domain flow algorithms. Partial orders, or Poset, are the source of high-performance, low-power execution patterns.

The Linear Algebra universe is particularly rich in partial orders, something that has been exploited for centuries 1. Matrix Computations2 by Golub, and van Loan provide @@ -83,12 +83,12 @@ b: b[i-1,j,k] c: c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k] } -

1: History of Matrices and Determinants

2: Matrix Computations, Gene Golub and Charles van Loan

\ No newline at end of file diff --git a/introduction/domain-flow/index.html b/introduction/domain-flow/index.html index c6552f7..b7132d5 100644 --- a/introduction/domain-flow/index.html +++ b/introduction/domain-flow/index.html @@ -7,7 +7,7 @@ Implementation technology will impact these phases differently, and we are seeking a programming model that is invariant to the difference. A thought experiment will shed light on the desired properties of such a model.">Domain Flow - Domain Flow Architecture -

Domain Flow

Domain Flow

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/example/index.html b/introduction/example/index.html index 1859d52..a21d033 100644 --- a/introduction/example/index.html +++ b/introduction/example/index.html @@ -3,7 +3,7 @@ compute ( (i,j,k) | 1 <= i,j,k <= N ) { a: a[i,j-1,k] b: b[i-1,j,k] c: c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k] } The underlying algorithm requires a domain of computation governed by a set of constraints, and a set of computational dependencies that implicitly define a partial order across all the operations in the computation. The partial order is readily visible in the need to have computed the result for $c[i,j,k-1]$ before the computation of $c[i,j,k]$ can commence. In contrast, the $a$ and $b$ recurrences are independent of each other.">An Example - Domain Flow Architecture -

An Example

Let’s look at a simple, but frequently used operator in Deep Learning inference: +

An Example

Let’s look at a simple, but frequently used operator in Deep Learning inference: dense matrix multiplication. A Domain Flow program 1 for this operator is shown below:

compute ( (i,j,k) | 1 <= i,j,k <= N ) {
     a: a[i,j-1,k]
@@ -38,12 +38,12 @@
 where the variable $a$ is defined.

A thorough understanding of the partial and total orders inherent in the parallel computation is essential for finding optimal domain flow algorithms.

High-performance, low-power execution patterns frequently involve a partial order that enables timely reuse of computational results, or creates flexibility to organize just-in-time arrival -of input operands to avoid memory elements.

In the next segment, let’s explore these execution patterns.

1: Derivation of Domain Flow Matmul

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/freeschedule/index.html b/introduction/freeschedule/index.html index d2cec28..ad5940e 100644 --- a/introduction/freeschedule/index.html +++ b/introduction/freeschedule/index.html @@ -1,5 +1,5 @@ Free Schedule - Domain Flow Architecture -

Free Schedule

Free Schedule

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/index.html b/introduction/index.html index 1f940ef..1bc300a 100644 --- a/introduction/index.html +++ b/introduction/index.html @@ -3,16 +3,16 @@ High-performance, low-latency, energy-efficient computation is particularly important for the emerging application class of autonomous intelligent systems.">Domain Flow Algorithms - Domain Flow Architecture -

Domain Flow Algorithms

Domain Flow algorithms are parallel algorithms that incorporate the constraints of space and time. +

Domain Flow Algorithms

Domain Flow algorithms are parallel algorithms that incorporate the constraints of space and time. By honoring the delay that is inherent to exchanging information between two spatially separate computation or storage sites, domain flow algorithms can improve performance and energy efficiency compared to sequential programming models that depend on (globally addressable) random access memory.

High-performance, low-latency, energy-efficient computation is particularly important for the emerging application -class of autonomous intelligent systems.

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/linearschedule/index.html b/introduction/linearschedule/index.html index bff3d3a..640d674 100644 --- a/introduction/linearschedule/index.html +++ b/introduction/linearschedule/index.html @@ -7,7 +7,7 @@ Let’s go through the thought experiment what the free schedule demands from a physical system. In the free schedule animation, the propagation recurrences distributing the $A$ and $B$ matrix elements throughout the 3D lattice run ‘ahead’ of the actual computational recurrence calculating the $C$ matrix elements.">Linear Schedules - Domain Flow Architecture -

Linear Schedules

Linear Schedules

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/nextsteps/index.html b/introduction/nextsteps/index.html index 7df637c..43dbb8a 100644 --- a/introduction/nextsteps/index.html +++ b/introduction/nextsteps/index.html @@ -1,11 +1,11 @@ Next Steps - Domain Flow Architecture -

Next Steps

Now that we have a rudimentary understanding of parallel algorithms and their physical -execution, the next step is to learn about what makes for a fast and efficient parallel algorithm.

\ No newline at end of file diff --git a/introduction/parallel-programming/index.html b/introduction/parallel-programming/index.html index abac790..aa1a267 100644 --- a/introduction/parallel-programming/index.html +++ b/introduction/parallel-programming/index.html @@ -1,5 +1,5 @@ Parallel Programming - Domain Flow Architecture -

Parallel Programming

To appreciate the domain flow programming model and what it enables, you need to think about the physical +

Parallel Programming

To appreciate the domain flow programming model and what it enables, you need to think about the physical form a ‘program evaluator’ could take. In the days when a processor occupied the volume of a small room, any physical computational machine was limited to a single computational element. This implied that the execution of any algorithm had to be specified as a complete order in time. @@ -19,12 +19,12 @@ machines mentioned above. Furthermore, the optimal algorithm even changes when the same machine architecture introduces a new, typically faster, implementation. And we are not just talking about simple algorithmic changes, such as loop order or blocking, sometimes even the underlying mathematics needs to change.

Given the complexity of writing parallel algorithms, this one-off nature of parallel algorithm design begged -the question: is there a parallel programming model that is invariant to the implementation technology of the machine?

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/spacetime/index.html b/introduction/spacetime/index.html index 0c15657..1712291 100644 --- a/introduction/spacetime/index.html +++ b/introduction/spacetime/index.html @@ -1,5 +1,5 @@ Constraints of Spacetime - Domain Flow Architecture -

Constraints of Spacetime

If you visualize the ‘world’ from the perspective of an operand flowing through a machine, +

Constraints of Spacetime

If you visualize the ‘world’ from the perspective of an operand flowing through a machine, you realize that a physical machine creates a specific spatial constraint for the movement of data. Processing nodes are fixed in space, and information is exchanged between nodes to accomplish some transformation. Nodes consume and generate information, and communication links move information (program and data) between nodes. @@ -22,12 +22,12 @@ the propagation of information. A computational event has to be able to ‘see’ its operands before it can commence. Otherwise stated, its operands need to lie in the future light cone.

These temporal constraints are further complicated by the fact that man-made structures today do not communicate through free space yet, and the physical communication structure adds additional constraints -on the shape and extend of the future cone.

These man-made computational structures are dubbed computational spacetimes.

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/wavefront/index.html b/introduction/wavefront/index.html index e9109e8..ad2f51c 100644 --- a/introduction/wavefront/index.html +++ b/introduction/wavefront/index.html @@ -1,5 +1,5 @@ Wavefronts of Computation - Domain Flow Architecture -

Wavefronts of Computation

Wavefronts of Computation

\ No newline at end of file + 
\ No newline at end of file diff --git a/linearsolvers/index.html b/linearsolvers/index.html index 7dcf471..96e7432 100644 --- a/linearsolvers/index.html +++ b/linearsolvers/index.html @@ -1,10 +1,10 @@ Linear Solvers - Domain Flow Architecture -

Linear Solvers

Solving systems of equations is the impetus for the class of algorithms called linear solvers.

\ No newline at end of file diff --git a/linearsolvers/lu/index.html b/linearsolvers/lu/index.html index 32c62ef..3824f8e 100644 --- a/linearsolvers/lu/index.html +++ b/linearsolvers/lu/index.html @@ -3,15 +3,15 @@ $$A = L \otimes U$$.">Gaussian Elimination - Domain Flow Architecture -

Gaussian Elimination

Gaussian Elimination, also known as $LU$ decomposition, decomposes a linear transformation +

Gaussian Elimination

Gaussian Elimination, also known as $LU$ decomposition, decomposes a linear transformation defined by the matrix $A$ into a lower-triangular matrix $L$, and an upper-triangular matrix $U$ -such that

$$A = L \otimes U$$.

\ No newline at end of file + 
\ No newline at end of file diff --git a/linearsolvers/solvers/index.html b/linearsolvers/solvers/index.html index 2e866e4..2ff772a 100644 --- a/linearsolvers/solvers/index.html +++ b/linearsolvers/solvers/index.html @@ -1,10 +1,10 @@ Linear Solvers - Domain Flow Architecture -

Linear Solvers

Linear solvers are algorithms designed to solve a linear system of equations.

\ No newline at end of file diff --git a/matrixkernels/index.html b/matrixkernels/index.html index c216863..6e37172 100644 --- a/matrixkernels/index.html +++ b/matrixkernels/index.html @@ -1,12 +1,12 @@ Matrix Kernels - Domain Flow Architecture -

Matrix Kernels

Matrix Kernels are important to characterize and classify the underlying system of equations. +

Matrix Kernels

Matrix Kernels are important to characterize and classify the underlying system of equations. Identifying singularity, and quantifying the null-space of a matrix are key operators -before we can try to solve systems of equations.

\ No newline at end of file + 
\ No newline at end of file diff --git a/matrixkernels/matrixkernels/index.html b/matrixkernels/matrixkernels/index.html index aa4119d..f7895f3 100644 --- a/matrixkernels/matrixkernels/index.html +++ b/matrixkernels/matrixkernels/index.html @@ -7,16 +7,16 @@ $L$ is the vector space of all elements $v$ of $V$ such that $L(v) = 0$, where 0 denotes the zero vector in $W, or more symbolically:">Matrix Kernels - Domain Flow Architecture -

Matrix Kernels

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace +

Matrix Kernels

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector. That is, given a linear map

$$L : V \rightarrow W$$ between two vector spaces $V$ and $W$, the kernel of

$L$ is the vector space of all elements $v$ of $V$ such that $L(v) = 0$, -where 0 denotes the zero vector in $W, or more symbolically:

$$ker(L) = \{ v \in V \hspace1ex | \hspace1ex L(v) = 0\} = L^{-1}(0)$$.

\ No newline at end of file + 
\ No newline at end of file diff --git a/search/index.html b/search/index.html index de8c343..3cfbe6f 100644 --- a/search/index.html +++ b/search/index.html @@ -1,11 +1,11 @@ Search - Domain Flow Architecture -

Search

-

\ No newline at end of file diff --git a/tags/algorithm/index.html b/tags/algorithm/index.html index 409721c..fc6f661 100644 --- a/tags/algorithm/index.html +++ b/tags/algorithm/index.html @@ -1,10 +1,10 @@ Algorithm - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/computational-spacetime/index.html b/tags/computational-spacetime/index.html index 0a9a30b..12aa259 100644 --- a/tags/computational-spacetime/index.html +++ b/tags/computational-spacetime/index.html @@ -1,10 +1,10 @@ Computational-Spacetime - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/conditioning/index.html b/tags/conditioning/index.html index 921395c..0bd1357 100644 --- a/tags/conditioning/index.html +++ b/tags/conditioning/index.html @@ -1,10 +1,10 @@ Conditioning - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/derivation/index.html b/tags/derivation/index.html index 2219b52..e2da080 100644 --- a/tags/derivation/index.html +++ b/tags/derivation/index.html @@ -1,10 +1,10 @@ Derivation - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/domain-flow/index.html b/tags/domain-flow/index.html index c89e100..4d86466 100644 --- a/tags/domain-flow/index.html +++ b/tags/domain-flow/index.html @@ -1,10 +1,10 @@ Domain-Flow - Tag - Domain Flow Architecture -

Tag - Domain-Flow

A

  • An Example

C

D

F

L

P

\ No newline at end of file diff --git a/tags/dsp/index.html b/tags/dsp/index.html index cab8a54..01f5a36 100644 --- a/tags/dsp/index.html +++ b/tags/dsp/index.html @@ -1,10 +1,10 @@ Dsp - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/filtering/index.html b/tags/filtering/index.html index 8071ed0..17566a8 100644 --- a/tags/filtering/index.html +++ b/tags/filtering/index.html @@ -1,10 +1,10 @@ Filtering - Tag - Domain Flow Architecture -

Tag - Filtering

D

\ No newline at end of file diff --git a/tags/free-schedule/index.html b/tags/free-schedule/index.html index aab0a21..806542f 100644 --- a/tags/free-schedule/index.html +++ b/tags/free-schedule/index.html @@ -1,10 +1,10 @@ Free-Schedule - Tag - Domain Flow Architecture -

Tag - Free-Schedule

F

\ No newline at end of file diff --git a/tags/identification/index.html b/tags/identification/index.html index b02bbe9..762103b 100644 --- a/tags/identification/index.html +++ b/tags/identification/index.html @@ -1,10 +1,10 @@ Identification - Tag - Domain Flow Architecture -

Tag - Identification

I

\ No newline at end of file diff --git a/tags/index-space/index.html b/tags/index-space/index.html index b1a2455..ed266f2 100644 --- a/tags/index-space/index.html +++ b/tags/index-space/index.html @@ -1,10 +1,10 @@ Index-Space - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/index.html b/tags/index.html index 00c045d..f3ea3e1 100644 --- a/tags/index.html +++ b/tags/index.html @@ -1,10 +1,10 @@ Tags - Domain Flow Architecture - \ No newline at end of file diff --git a/tags/lattice/index.html b/tags/lattice/index.html index 3f5b20d..0597bec 100644 --- a/tags/lattice/index.html +++ b/tags/lattice/index.html @@ -1,10 +1,10 @@ Lattice - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/linear-schedule/index.html b/tags/linear-schedule/index.html index 82a20c8..6fef349 100644 --- a/tags/linear-schedule/index.html +++ b/tags/linear-schedule/index.html @@ -1,10 +1,10 @@ Linear-Schedule - Tag - Domain Flow Architecture -

Tag - Linear-Schedule

L

\ No newline at end of file diff --git a/tags/matrix-multiply/index.html b/tags/matrix-multiply/index.html index b91e65b..a32d351 100644 --- a/tags/matrix-multiply/index.html +++ b/tags/matrix-multiply/index.html @@ -1,10 +1,10 @@ Matrix-Multiply - Tag - Domain Flow Architecture -

Tag - Matrix-Multiply

A

  • An Example

C

D

F

L

P

\ No newline at end of file diff --git a/tags/spectral-analysis/index.html b/tags/spectral-analysis/index.html index 22c173e..cfe60ba 100644 --- a/tags/spectral-analysis/index.html +++ b/tags/spectral-analysis/index.html @@ -1,10 +1,10 @@ Spectral-Analysis - Tag - Domain Flow Architecture -

Tag - Spectral-Analysis

S

\ No newline at end of file diff --git a/tags/transform/index.html b/tags/transform/index.html index f7355e9..10b4dd1 100644 --- a/tags/transform/index.html +++ b/tags/transform/index.html @@ -1,10 +1,10 @@ Transform - Tag - Domain Flow Architecture -

Tag - Transform

T

  • Transforms
\ No newline at end of file