diff --git a/404.html b/404.html index 0fff2f1..0ac17fe 100644 --- a/404.html +++ b/404.html @@ -1,2 +1,2 @@ 404 Page not found - Domain Flow Architecture -

44

Not found

Whoops. Looks like this page doesn't exist ¯\_(ツ)_/¯.

Go to homepage

\ No newline at end of file +

44

Not found

Whoops. Looks like this page doesn't exist ¯\_(ツ)_/¯.

Go to homepage

\ No newline at end of file diff --git a/blas/index.html b/blas/index.html index 9411492..6d19321 100644 --- a/blas/index.html +++ b/blas/index.html @@ -7,16 +7,16 @@ components in computational methods, the investment can pay high dividends.">Basic Linear Algebra - Domain Flow Architecture -

Basic Linear Algebra

Basic Linear Algebra Subroutines are an historically significant set of +

Basic Linear Algebra

Basic Linear Algebra Subroutines are an historically significant set of functions that encapsulate the basic building blocks of a large collection of linear algebra algorithms and implementation.

The BLAS library has proven to be a very productive mechanism to create and disseminate highly optimized numerical libraries to a plethora of computer architectures and machines. Writing high-performance linear -algebra algorithms turns out to be a tenacious problem, but since linear algebra operations are essential
components in computational methods, the investment can pay high dividends.

\ No newline at end of file + 
\ No newline at end of file diff --git a/blas/level1/index.html b/blas/level1/index.html index 863c989..7173767 100644 --- a/blas/level1/index.html +++ b/blas/level1/index.html @@ -7,7 +7,7 @@ vector scale: scalar-vector multiplication: $z = \alpha x \implies (z_i = \alpha x_i)$ vector element addition: $z = x + y \implies (z_i = x_i + y_i)$ vector element multiply: $z = x * y \implies (z_i = x_i * y_i)$ vector dot product: $c = x^Ty \implies ( c = \sum_{i = 1}^n x_i y_i ) $, aka inner-product saxpy, or scalar alpha x plus y, $z = \alpha x + y \implies z_i = \alpha x_i + y_i $ The fifth operator, while technically redundant, makes the expressions of linear algebra algorithms more concise.">BLAS Level 1 - Domain Flow Architecture -

BLAS Level 1

BLAS Level 1 are $\mathcal{O}(N)$ class operators. This makes these operators operand access limited +

BLAS Level 1

BLAS Level 1 are $\mathcal{O}(N)$ class operators. This makes these operators operand access limited and thus require careful distribution in a parallel environment.

There are four basic vector operations, and a fifth convenience operators. Let $ \alpha \in \Bbb{R}, x \in \Bbb{R^n}, y \in \Bbb{R^n}, z \in \Bbb{R^n}$$ then:

  1. vector scale: scalar-vector multiplication: $z = \alpha x \implies (z_i = \alpha x_i)$
  2. vector element addition: $z = x + y \implies (z_i = x_i + y_i)$
  3. vector element multiply: $z = x * y \implies (z_i = x_i * y_i)$
  4. vector dot product: $c = x^Ty \implies ( c = \sum_{i = 1}^n x_i y_i ) $, aka inner-product
  5. saxpy, or scalar alpha x plus y, $z = \alpha x + y \implies z_i = \alpha x_i + y_i $

The fifth operator, while technically redundant, makes the expressions of linear algebra algorithms more concise.

One class of domain flow programs for these operators assumes a linear distribution of the vectors, @@ -51,12 +51,12 @@ z: alpha[i-1,j,k] * x[i,j-1,k] + y[i,j,k-1] }

The final scalar alpha x plus y, or saxpy operator is combining the vector scale and vector addition operators, and will show the same constraint as the vector scale -operator due to the required propagation broadcast of the scaling factor.

\ No newline at end of file diff --git a/blas/level2/index.html b/blas/level2/index.html index c865274..65037c3 100644 --- a/blas/level2/index.html +++ b/blas/level2/index.html @@ -3,7 +3,7 @@ Let $A \in \Bbb{R^{mxn}}$, the matrix-vector product is defined as: $$z = Ax, \space where \space x \in \Bbb{R^n}$$">BLAS Level 2 - Domain Flow Architecture -

BLAS Level 2

BLAS Level 2 are $\mathcal{O}(N^2)$ class operators, still operand access +

BLAS Level 2

BLAS Level 2 are $\mathcal{O}(N^2)$ class operators, still operand access limited as we need to fetch multiple operands per operation without any reuse. The core operator is the matrix-vector multiplication in all its different forms specialized for matrix shape — triangular, banded, symmetric —, and matrix type — integer, real, complex, @@ -15,12 +15,12 @@ x: x[i,j-1,k] z: a[i,j,k-1] * x[i,j-1,k] }

Banded, symmetric, and triangular versions simply alter the constraint set of the domains of -computation: the fundamental dependencies do not change.

\ No newline at end of file diff --git a/blas/level3/index.html b/blas/level3/index.html index 4cbc6e2..9e8299d 100644 --- a/blas/level3/index.html +++ b/blas/level3/index.html @@ -11,7 +11,7 @@ In addition to matrix-matrix multiply there are the Rank-k update operators, which are outer-products and matrix additions. Here is a Hermitian Rank-k update: $$ C = \alpha A A^T + \beta C, \space where \space C \space is \space Hermitian. $$ A Hermitian matrix is defined as a matrix that is equal to its Hermitian conjugate. In other words, the matrix $C$ is Hermitian if and only if $C = C^H$. Obviously a Hermitian matrix must be square. Hermitian matrices can be understood as the complex extension of real symmetric matrices.">BLAS Level 3 - Domain Flow Architecture -

BLAS Level 3

BLAS Level 3 are $\mathcal{O}(N^3)$ operators, and finally compute bound +

BLAS Level 3

BLAS Level 3 are $\mathcal{O}(N^3)$ operators, and finally compute bound creating many opportunities to optimize operand reuse.

In addition to matrix-matrix multiply there are the Rank-k update operators, which are outer-products and matrix additions.

Here is a Hermitian Rank-k update:

$$ C = \alpha A A^T + \beta C, \space where \space C \space is \space Hermitian. $$

A Hermitian matrix is defined as a matrix that is equal to its Hermitian conjugate. In other words, the matrix $C$ is Hermitian if and only if $C = C^H$. Obviously a Hermitian @@ -32,12 +32,12 @@ c: c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k] } } -

Here we introduce a conditional constraint that impacts the domain of computation for a set of equations.

\ No newline at end of file diff --git a/categories/analyzing/index.html b/categories/analyzing/index.html index 01240fe..9c5b13e 100644 --- a/categories/analyzing/index.html +++ b/categories/analyzing/index.html @@ -1,10 +1,10 @@ Analyzing - Category - Domain Flow Architecture -

Category - Analyzing

S

\ No newline at end of file diff --git a/categories/conditioning/index.html b/categories/conditioning/index.html index fb648d0..41ee522 100644 --- a/categories/conditioning/index.html +++ b/categories/conditioning/index.html @@ -1,10 +1,10 @@ Conditioning - Category - Domain Flow Architecture -

Category - Conditioning

S

\ No newline at end of file diff --git a/categories/design/index.html b/categories/design/index.html index 3eb5530..fec733d 100644 --- a/categories/design/index.html +++ b/categories/design/index.html @@ -1,10 +1,10 @@ Design - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/domain-flow/index.html b/categories/domain-flow/index.html index 10827b0..2c2febf 100644 --- a/categories/domain-flow/index.html +++ b/categories/domain-flow/index.html @@ -1,10 +1,10 @@ Domain-Flow - Category - Domain Flow Architecture -

Category - Domain-Flow

A

  • An Example

C

D

F

L

P

\ No newline at end of file diff --git a/categories/dsp/index.html b/categories/dsp/index.html index fbe9604..a6fc5af 100644 --- a/categories/dsp/index.html +++ b/categories/dsp/index.html @@ -1,10 +1,10 @@ Dsp - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/filtering/index.html b/categories/filtering/index.html index 7f59cbc..540019b 100644 --- a/categories/filtering/index.html +++ b/categories/filtering/index.html @@ -1,10 +1,10 @@ Filtering - Category - Domain Flow Architecture -

Category - Filtering

D

\ No newline at end of file diff --git a/categories/identification/index.html b/categories/identification/index.html index 0fb0f06..96964a9 100644 --- a/categories/identification/index.html +++ b/categories/identification/index.html @@ -1,10 +1,10 @@ Identification - Category - Domain Flow Architecture -

Category - Identification

I

\ No newline at end of file diff --git a/categories/index.html b/categories/index.html index 14d82bd..6467f59 100644 --- a/categories/index.html +++ b/categories/index.html @@ -1,10 +1,10 @@ Categories - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/introduction/index.html b/categories/introduction/index.html index 2770287..9cd1b9a 100644 --- a/categories/introduction/index.html +++ b/categories/introduction/index.html @@ -1,10 +1,10 @@ Introduction - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/matrix-math/index.html b/categories/matrix-math/index.html index 781539d..76c7f45 100644 --- a/categories/matrix-math/index.html +++ b/categories/matrix-math/index.html @@ -1,10 +1,10 @@ Matrix-Math - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/schedule/index.html b/categories/schedule/index.html index 7ea787b..3368fb8 100644 --- a/categories/schedule/index.html +++ b/categories/schedule/index.html @@ -1,10 +1,10 @@ Schedule - Category - Domain Flow Architecture -

Category - Schedule

F

L

\ No newline at end of file diff --git a/categories/spacetime/index.html b/categories/spacetime/index.html index b3b195d..63c86f2 100644 --- a/categories/spacetime/index.html +++ b/categories/spacetime/index.html @@ -1,10 +1,10 @@ Spacetime - Category - Domain Flow Architecture -
\ No newline at end of file diff --git a/categories/transforming/index.html b/categories/transforming/index.html index 7bb52bf..2a921aa 100644 --- a/categories/transforming/index.html +++ b/categories/transforming/index.html @@ -1,10 +1,10 @@ Transforming - Category - Domain Flow Architecture -

Category - Transforming

T

  • Transforms
\ No newline at end of file diff --git a/contentdev/index.html b/contentdev/index.html index ccf13b4..29bccfb 100644 --- a/contentdev/index.html +++ b/contentdev/index.html @@ -1,11 +1,11 @@ Content Development - Domain Flow Architecture -

Content Development

The following pages are examples for content developers to quickly add interactive -content that aids in understanding parallel algorithm design and optimization.

\ No newline at end of file diff --git a/contentdev/prototype/index.html b/contentdev/prototype/index.html index 133cd9e..fbcb220 100644 --- a/contentdev/prototype/index.html +++ b/contentdev/prototype/index.html @@ -3,7 +3,7 @@ All you need is a tag with an id and some CSS styling and a call into an animation program that fills that canvas.">Prototype - Domain Flow Architecture -

Prototype

Prototype

This is a basic skeleton of a Hugo Markdown page that includes a three.js animation.

All you need is a <canvas> tag with an id and some CSS styling and a call into -an animation program that fills that canvas.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/currentstate/index.html b/design/currentstate/index.html index ade2ed3..08e2283 100644 --- a/design/currentstate/index.html +++ b/design/currentstate/index.html @@ -1,5 +1,5 @@ Computational Dynamics - Domain Flow Architecture -

Computational Dynamics

A memory access in a physical machine can be very complex. For example, +

Computational Dynamics

A memory access in a physical machine can be very complex. For example, when a program accesses an operand located at an address that is not in physical memory, the processor registers a page miss. The performance difference between an access from the local L1 cache versus a page miss @@ -33,12 +33,12 @@ modulation due to power constraints, causes the collective to wait for the slowest process. As the number of processors grows, so does variability. And unfortunately, when variability rises processor -utilization drops and algorithmic performance suffers.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/dfa/index.html b/design/dfa/index.html index e26aa28..e9f864c 100644 --- a/design/dfa/index.html +++ b/design/dfa/index.html @@ -1,5 +1,5 @@ Domain Flow Architecture -

Domain Flow Architecture

Domain Flow Architecture (DFA) machines are the class of machines that execute +

Domain Flow Architecture

Domain Flow Architecture (DFA) machines are the class of machines that execute using the domain flow execution model. The fundamental problem limiting the energy efficiency of the data flow machine is the size of the CAM and fabric. As they are managed as two separate clusters of resources, @@ -15,12 +15,12 @@ exhibit partial orders that are regular and are separated in space. That is a mouthful, but we can make this more tangible when we discuss in more detail the temporal behavior of a domain flow program in the -next section about time.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/dfm/index.html b/design/dfm/index.html index 5ae318d..acafe50 100644 --- a/design/dfm/index.html +++ b/design/dfm/index.html @@ -3,7 +3,7 @@ write an operand into an appropriate operand slot in an instruction token stored in a Content Addressable Memory (CAM) by an instruction tag check if all operands are present to start the execution cycle of the instruction if an instruction is ready then extract it from the CAM and inject it into a fabric of computational elements deliver the instruction to an available execution unit execute the instruction, and finally write the result back into an operand slot in target instruction token stored in the CAM The strength of the resource contention management of the Data Flow Machine is that the machine can execute along the free schedule, that is, the inherent parallelism of the algorithm. Any physical implementation, however, is constrained by the energy-efficiency of the CAM and the network that connects the CAM to the fabric of computational elements. As concurrency demands grow the efficiency of both the CAM and the fabric decreases making large data flow machines unattractive. However, small data flow machines don’t have this problem and are able to deliver energy-efficient, low-latency resource management. Today, all high-performance microprocessors have a data flow machine at their core.">Data Flow Machine - Domain Flow Architecture -

Data Flow Machine

In the late 60’s and 70’s when computer scientists were exploring parallel +

Data Flow Machine

In the late 60’s and 70’s when computer scientists were exploring parallel computation by building the first parallel machines and developing the parallel algorithm complexity theory, folk realized that this over-constrained specification was a real problem for concurrency. @@ -18,12 +18,12 @@ decreases making large data flow machines unattractive. However, small data flow machines don’t have this problem and are able to deliver energy-efficient, low-latency resource management. Today, all high-performance microprocessors -have a data flow machine at their core.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/elements/index.html b/design/elements/index.html index f1fb2e9..53f7795 100644 --- a/design/elements/index.html +++ b/design/elements/index.html @@ -15,7 +15,7 @@ Item #2 is well-known among high-performance algorithm designers. Item #3 is well-known among hardware designers and computer engineers. When designing domain flow algorithms, we are looking for an energy efficient embedding of a computational graph in space, and it is thus to be expected that we need to combine all three attributes of minimizing operator count, operand movement, and resource contention. The complexity of minimizing resource contention is what makes hardware design so much more complex. But the complexity of operator contention can be mitigated by clever resource contention management.">Elements of Design - Domain Flow Architecture -

Elements of Design

We can summarize the attributes of good parallel algorithm design as

  1. low operation count, where operation count is defined as the sum of operators and operand accesses
  2. minimal operand movement
  3. minimal resource contention

Item #1 is well-known by theoretical computer scientists.

Item #2 is well-known among high-performance algorithm designers.

Item #3 is well-known among hardware designers and computer engineers.

When designing domain flow algorithms, we are looking for an energy +

Elements of Design

We can summarize the attributes of good parallel algorithm design as

  1. low operation count, where operation count is defined as the sum of operators and operand accesses
  2. minimal operand movement
  3. minimal resource contention

Item #1 is well-known by theoretical computer scientists.

Item #2 is well-known among high-performance algorithm designers.

Item #3 is well-known among hardware designers and computer engineers.

When designing domain flow algorithms, we are looking for an energy efficient embedding of a computational graph in space, and it is thus to be expected that we need to combine all three attributes of minimizing operator count, operand movement, and resource contention. @@ -31,12 +31,12 @@ it forces a total order on the computation graph. This tasks of creating the total order falls on the algorithm designer.

For parallel execution we need a resource contention management mechanism that is more efficient. And this is where our -computational spacetime will come in handy.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/index.html b/design/index.html index b18f27b..7539b73 100644 --- a/design/index.html +++ b/design/index.html @@ -1,17 +1,17 @@ Elements of Good Design - Domain Flow Architecture -

Elements of Good Design

The best algorithms for sequential execution are those that minimize the number +

Elements of Good Design

The best algorithms for sequential execution are those that minimize the number of operations to yield results. Computational complexity theory has aided this quest, but any performance-minded algorithm designer knows that the best theoretical algorithms are not necessarily the fastest when executed on real hardware. The difference is typically caused by the trade-off sequential algorithms have to make between computation and accessing memory. The constraints of data movement are even more pronounced in parallel algorithms as demonstrated in the previous section.

This chapter explores the elements of good design for parallel algorithms and their -execution on real hardware.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/index.xml b/design/index.xml index fb0bf50..03e9b57 100644 --- a/design/index.xml +++ b/design/index.xml @@ -4,5 +4,5 @@ Item #2 is well-known among high-performance algorithm designers. Item #3 is well-known among hardware designers and computer engineers. When designing domain flow algorithms, we are looking for an energy efficient embedding of a computational graph in space, and it is thus to be expected that we need to combine all three attributes of minimizing operator count, operand movement, and resource contention. The complexity of minimizing resource contention is what makes hardware design so much more complex. But the complexity of operator contention can be mitigated by clever resource contention management.Data Flow Machinehttps://stillwater-sc.github.io/domain-flow/design/dfm/index.htmlFri, 17 Feb 2017 09:20:57 -0500https://stillwater-sc.github.io/domain-flow/design/dfm/index.htmlIn the late 60’s and 70’s when computer scientists were exploring parallel computation by building the first parallel machines and developing the parallel algorithm complexity theory, folk realized that this over-constrained specification was a real problem for concurrency. One proposal to rectify this was a natively parallel execution model called the Data Flow Machine (DFM). A Data Flow Machine uses a different resource contention management mechanism: write an operand into an appropriate operand slot in an instruction token stored in a Content Addressable Memory (CAM) by an instruction tag check if all operands are present to start the execution cycle of the instruction if an instruction is ready then extract it from the CAM and inject it into a fabric of computational elements deliver the instruction to an available execution unit execute the instruction, and finally write the result back into an operand slot in target instruction token stored in the CAM The strength of the resource contention management of the Data Flow Machine is that the machine can execute along the free schedule, that is, the inherent parallelism of the algorithm. Any physical implementation, however, is constrained by the energy-efficiency of the CAM and the network that connects the CAM to the fabric of computational elements. As concurrency demands grow the efficiency of both the CAM and the fabric decreases making large data flow machines unattractive. However, small data flow machines don’t have this problem and are able to deliver energy-efficient, low-latency resource management. Today, all high-performance microprocessors have a data flow machine at their core.Domain Flow Architecturehttps://stillwater-sc.github.io/domain-flow/design/dfa/index.htmlFri, 17 Feb 2017 09:20:57 -0500https://stillwater-sc.github.io/domain-flow/design/dfa/index.htmlDomain Flow Architecture (DFA) machines are the class of machines that execute using the domain flow execution model. The fundamental problem limiting the energy efficiency of the data flow machine is the size of the CAM and fabric. As they are managed as two separate clusters of resources, they grow together. The domain flow execution model recognizes that for an important class of algorithms, we can distribute the CAM across the computational elements in the fabric, and we can scale the machine without negatively impacting the cycle time of execution.Time: the whenhttps://stillwater-sc.github.io/domain-flow/design/time/index.htmlWed, 15 Feb 2017 07:48:27 -0500https://stillwater-sc.github.io/domain-flow/design/time/index.htmlThe free schedule represents the inherent concurrency of the parallel algorithm, and, under the assumption of infinite resources, it is the fastest schedule possible. -Let x be a computation that uses y as input, then the free schedule is defined as: \begin{equation} T(x) =\begin{cases} 1, & \text{if y is an external input}\\ 1 + max(T(y)) & \text{otherwise} \end{cases} \end{equation} The free schedule is defined at the level of the individual operations. It does not provide any information about the global data movement or the global structure of the interactions between data and operation. Moreover, the above equation describes a logical sequencing of operations, it does not specify a physical evolution.Space: the wherehttps://stillwater-sc.github.io/domain-flow/design/space/index.htmlWed, 15 Feb 2017 07:49:38 -0500https://stillwater-sc.github.io/domain-flow/design/space/index.htmlSpaceNext Stepshttps://stillwater-sc.github.io/domain-flow/design/nextsteps/index.htmlWed, 15 Feb 2017 07:49:53 -0500https://stillwater-sc.github.io/domain-flow/design/nextsteps/index.htmlIn this short introduction to parallel algorithms in general and domain flow in particular, our next step is to look at specific algorithms, and explore their optimal parallel execution dynamics. +Let x be a computation that uses y as input, then the free schedule is defined as: \begin{equation} T(x) =\begin{cases} 1, & \text{if y is an external input}\\ 1 + max(T(y)) & \text{otherwise} \end{cases} \end{equation} The free schedule is defined at the level of the individual operations. It does not provide any information about the global data movement or the global structure of the interactions between data and operation. Moreover, the above equation describes a logical sequencing of operations, it does not specify a physical evolution.Space: the wherehttps://stillwater-sc.github.io/domain-flow/design/space/index.htmlWed, 15 Feb 2017 07:49:38 -0500https://stillwater-sc.github.io/domain-flow/design/space/index.htmlSpace is a scarce resource, with a direct cost associated to it. A computational engine, such as a Stored Program Machine, needs to allocate area for ALUs and register files, and to make these work well, even more space is required to surround these resources with cache hierarchies and memory controllers. But even if space was freely available, it still presents a cost from a parallel computational perspective, since it takes energy to get information across space, as it takes time to do so.Next Stepshttps://stillwater-sc.github.io/domain-flow/design/nextsteps/index.htmlWed, 15 Feb 2017 07:49:53 -0500https://stillwater-sc.github.io/domain-flow/design/nextsteps/index.htmlIn this short introduction to parallel algorithms in general and domain flow in particular, our next step is to look at specific algorithms, and explore their optimal parallel execution dynamics. Once we get a good collection of fast, and energy efficient algorithms together, we can start to explore how best to engineer combinations of these operators. We will discover that sometimes, the cost of an information exchange makes a whole class of algorithms unattractive for parallel executions. With that insight comes the need to create new algorithms and sometimes completely new mathematical approaches to properly leverage the available resources. \ No newline at end of file diff --git a/design/nextsteps/index.html b/design/nextsteps/index.html index f8eb786..cd3675b 100644 --- a/design/nextsteps/index.html +++ b/design/nextsteps/index.html @@ -3,19 +3,19 @@ Once we get a good collection of fast, and energy efficient algorithms together, we can start to explore how best to engineer combinations of these operators. We will discover that sometimes, the cost of an information exchange makes a whole class of algorithms unattractive for parallel executions. With that insight comes the need to create new algorithms and sometimes completely new mathematical approaches to properly leverage the available resources.">Next Steps - Domain Flow Architecture -

Next Steps

In this short introduction to parallel algorithms in general and domain flow +

Next Steps

In this short introduction to parallel algorithms in general and domain flow in particular, our next step is to look at specific algorithms, and explore their optimal parallel execution dynamics.

Once we get a good collection of fast, and energy efficient algorithms together, we can start to explore how best to engineer combinations of these operators. We will discover that sometimes, the cost of an information exchange makes a whole class of algorithms unattractive for parallel executions. With that insight comes the need to create new algorithms and sometimes completely new -mathematical approaches to properly leverage the available resources.

\ No newline at end of file + 
\ No newline at end of file diff --git a/design/space/index.html b/design/space/index.html index ea6b5c8..295dfe4 100644 --- a/design/space/index.html +++ b/design/space/index.html @@ -1,10 +1,41 @@ -Space: the where - Domain Flow Architecture -

Space: the where

Space

\ No newline at end of file diff --git a/design/time/index.html b/design/time/index.html index aed4b66..5521709 100644 --- a/design/time/index.html +++ b/design/time/index.html @@ -3,7 +3,7 @@ Let x be a computation that uses y as input, then the free schedule is defined as: \begin{equation} T(x) =\begin{cases} 1, & \text{if y is an external input}\\ 1 + max(T(y)) & \text{otherwise} \end{cases} \end{equation} The free schedule is defined at the level of the individual operations. It does not provide any information about the global data movement or the global structure of the interactions between data and operation. Moreover, the above equation describes a logical sequencing of operations, it does not specify a physical evolution.">Time: the when - Domain Flow Architecture -

Time: the when

The free schedule represents the inherent concurrency of the parallel algorithm, and, under the assumption +

Time: the when

The free schedule represents the inherent concurrency of the parallel algorithm, and, under the assumption of infinite resources, it is the fastest schedule possible.

Let x be a computation that uses y as input, then the free schedule is defined as: \begin{equation} T(x) =\begin{cases} @@ -66,12 +66,12 @@ recurrence equations has practical application for the design of optimal computational data paths. The Domain Flow model uses the Karp, Miller, and Winograd piecewise linear scheduling -construction to sequence activity wavefronts.

\ No newline at end of file + 
\ No newline at end of file diff --git a/dsp/conditioning/index.html b/dsp/conditioning/index.html index 99751ac..4b92feb 100644 --- a/dsp/conditioning/index.html +++ b/dsp/conditioning/index.html @@ -1,10 +1,10 @@ Signal Conditioning - Domain Flow Architecture -
\ No newline at end of file diff --git a/dsp/filters/index.html b/dsp/filters/index.html index 77f4d23..7ad231b 100644 --- a/dsp/filters/index.html +++ b/dsp/filters/index.html @@ -1,10 +1,10 @@ Digital Filtering - Domain Flow Architecture -
\ No newline at end of file diff --git a/dsp/identification/index.html b/dsp/identification/index.html index 1bc7510..a3f1bb5 100644 --- a/dsp/identification/index.html +++ b/dsp/identification/index.html @@ -3,7 +3,7 @@ When there are signals and noises, physicists try to identify signals by modeling them, whereas statisticians oppositely try to model noise to identify signals. In this study, we applied the statisticians’ concept of signal detection of physics data with small-size samples and high dimensions without modeling the signals. Most of the data in nature, whether noises or signals, are assumed to be generated by dynamical systems; thus, there is essentially no distinction between these generating processes. We propose that the correlation length of a dynamical system and the number of samples are crucial for the practical definition of noise variables among the signal variables generated by such a system. Since variables with short-term correlations reach normal distributions faster as the number of samples decreases, they are regarded to be noise-like variables, whereas variables with opposite properties are signal-like variables. Normality tests are not effective for data of small-size samples with high dimensions. Therefore, we modeled noises on the basis of the property of a noise variable, that is, the uniformity of the histogram of the probability that a variable is a noise. We devised a method of detecting signal variables from the structural change of the histogram according to the decrease in the number of samples. We applied our method to the data generated by globally coupled map, which can produce time series data with different correlation lengths, and also applied to gene expression data, which are typical static data of small-size samples with high dimensions, and we successfully detected signal variables from them.">Identification - Domain Flow Architecture -

Identification

Identification is the act of recognizing the signal in the presence of noise.

When there are signals and noises, physicists try to identify signals by modeling them, +

Identification

Identification is the act of recognizing the signal in the presence of noise.

When there are signals and noises, physicists try to identify signals by modeling them, whereas statisticians oppositely try to model noise to identify signals. In this study, we applied the statisticians’ concept of signal detection of physics data with small-size samples and high dimensions without modeling the signals. Most of the data in nature, @@ -22,12 +22,12 @@ to the data generated by globally coupled map, which can produce time series data with different correlation lengths, and also applied to gene expression data, which are typical static data of small-size samples with high dimensions, and we successfully -detected signal variables from them.

\ No newline at end of file + 
\ No newline at end of file diff --git a/dsp/index.html b/dsp/index.html index 1867b1a..a27c63b 100644 --- a/dsp/index.html +++ b/dsp/index.html @@ -1,13 +1,13 @@ Digital Signal Processing - Domain Flow Architecture -

Digital Signal Processing

Digital Signal Processing is the discrete realization of Analog Signal Processing +

Digital Signal Processing

Digital Signal Processing is the discrete realization of Analog Signal Processing operations used to condition, amplify, characterize, and transform. Digital Signal Processing is essential when interfacing a digital computer -to a physical process to enable reproducible and high-fidelity applications.

\ No newline at end of file + 
\ No newline at end of file diff --git a/dsp/spectral/index.html b/dsp/spectral/index.html index 453dab0..51d3c85 100644 --- a/dsp/spectral/index.html +++ b/dsp/spectral/index.html @@ -1,10 +1,10 @@ Spectral Analysis - Domain Flow Architecture -
\ No newline at end of file diff --git a/dsp/transforms/index.html b/dsp/transforms/index.html index 741953f..8734642 100644 --- a/dsp/transforms/index.html +++ b/dsp/transforms/index.html @@ -1,10 +1,10 @@ Transforms - Domain Flow Architecture -
\ No newline at end of file diff --git a/factorization/factorization/index.html b/factorization/factorization/index.html index 1f5f145..510d7c3 100644 --- a/factorization/factorization/index.html +++ b/factorization/factorization/index.html @@ -3,12 +3,12 @@ $$ x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$">Matrix Factorizations - Domain Flow Architecture -

Matrix Factorizations

This is the quadratic equation:

$$ x = {-b \pm \sqrt{b^2-4ac} \over 2a} $$
\ No newline at end of file diff --git a/factorization/index.html b/factorization/index.html index 83f1023..786acf5 100644 --- a/factorization/index.html +++ b/factorization/index.html @@ -1,12 +1,12 @@ Matrix Factorization - Domain Flow Architecture -

Matrix Factorization

Matrix factorizations are the work horse of linear algebra applications. +

Matrix Factorization

Matrix factorizations are the work horse of linear algebra applications. Factorizations create equivalences that improve the usability or robustness -of an algorithm.

\ No newline at end of file + 
\ No newline at end of file diff --git a/images/flynn-taxonomy.png b/images/flynn-taxonomy.png new file mode 100644 index 0000000..1d6cd8f Binary files /dev/null and b/images/flynn-taxonomy.png differ diff --git a/introduction/computational-spacetime/index.html b/introduction/computational-spacetime/index.html index 59b0855..8a336ae 100644 --- a/introduction/computational-spacetime/index.html +++ b/introduction/computational-spacetime/index.html @@ -1,5 +1,5 @@ Computational Spacetime - Domain Flow Architecture -

Computational Spacetime

Computational Spacetime

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/derivation/index.html b/introduction/derivation/index.html index cfd27c6..143a5ef 100644 --- a/introduction/derivation/index.html +++ b/introduction/derivation/index.html @@ -3,7 +3,7 @@ The Linear Algebra universe is particularly rich in partial orders, something that has been exploited for centuries 1. Matrix Computations2 by Golub, and van Loan provide a comprehensive review. What follows may be a bit technical, but keep in mind the visualizations of the previous pages as you try to visualize what the math implies.">Derivation of the matrix multiply domain flow program - Domain Flow Architecture -

Derivation of the matrix multiply domain flow program

The concepts of partial and total orders are essential for finding optimal domain flow algorithms. +

Derivation of the matrix multiply domain flow program

The concepts of partial and total orders are essential for finding optimal domain flow algorithms. Partial orders, or Poset, are the source of high-performance, low-power execution patterns.

The Linear Algebra universe is particularly rich in partial orders, something that has been exploited for centuries 1. Matrix Computations2 by Golub, and van Loan provide @@ -83,12 +83,12 @@ b: b[i-1,j,k] c: c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k] } -

1: History of Matrices and Determinants

2: Matrix Computations, Gene Golub and Charles van Loan

\ No newline at end of file diff --git a/introduction/domain-flow/index.html b/introduction/domain-flow/index.html index b7132d5..45f5d5a 100644 --- a/introduction/domain-flow/index.html +++ b/introduction/domain-flow/index.html @@ -7,7 +7,7 @@ Implementation technology will impact these phases differently, and we are seeking a programming model that is invariant to the difference. A thought experiment will shed light on the desired properties of such a model.">Domain Flow - Domain Flow Architecture -

Domain Flow

Domain Flow

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/example/index.html b/introduction/example/index.html index a21d033..3861cfc 100644 --- a/introduction/example/index.html +++ b/introduction/example/index.html @@ -3,7 +3,7 @@ compute ( (i,j,k) | 1 <= i,j,k <= N ) { a: a[i,j-1,k] b: b[i-1,j,k] c: c[i,j,k-1] + a[i,j-1,k] * b[i-1,j,k] } The underlying algorithm requires a domain of computation governed by a set of constraints, and a set of computational dependencies that implicitly define a partial order across all the operations in the computation. The partial order is readily visible in the need to have computed the result for $c[i,j,k-1]$ before the computation of $c[i,j,k]$ can commence. In contrast, the $a$ and $b$ recurrences are independent of each other.">An Example - Domain Flow Architecture -

An Example

Let’s look at a simple, but frequently used operator in Deep Learning inference: +

An Example

Let’s look at a simple, but frequently used operator in Deep Learning inference: dense matrix multiplication. A Domain Flow program 1 for this operator is shown below:

compute ( (i,j,k) | 1 <= i,j,k <= N ) {
     a: a[i,j-1,k]
@@ -38,12 +38,12 @@
 where the variable $a$ is defined.

A thorough understanding of the partial and total orders inherent in the parallel computation is essential for finding optimal domain flow algorithms.

High-performance, low-power execution patterns frequently involve a partial order that enables timely reuse of computational results, or creates flexibility to organize just-in-time arrival -of input operands to avoid memory elements.

In the next segment, let’s explore these execution patterns.

1: Derivation of Domain Flow Matmul

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/freeschedule/index.html b/introduction/freeschedule/index.html index ad5940e..228137a 100644 --- a/introduction/freeschedule/index.html +++ b/introduction/freeschedule/index.html @@ -1,5 +1,5 @@ Free Schedule - Domain Flow Architecture -

Free Schedule

Free Schedule

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/index.html b/introduction/index.html index 1bc300a..28bf971 100644 --- a/introduction/index.html +++ b/introduction/index.html @@ -3,16 +3,16 @@ High-performance, low-latency, energy-efficient computation is particularly important for the emerging application class of autonomous intelligent systems.">Domain Flow Algorithms - Domain Flow Architecture -

Domain Flow Algorithms

Domain Flow algorithms are parallel algorithms that incorporate the constraints of space and time. +

Domain Flow Algorithms

Domain Flow algorithms are parallel algorithms that incorporate the constraints of space and time. By honoring the delay that is inherent to exchanging information between two spatially separate computation or storage sites, domain flow algorithms can improve performance and energy efficiency compared to sequential programming models that depend on (globally addressable) random access memory.

High-performance, low-latency, energy-efficient computation is particularly important for the emerging application -class of autonomous intelligent systems.

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/linearschedule/index.html b/introduction/linearschedule/index.html index 640d674..db8a6e9 100644 --- a/introduction/linearschedule/index.html +++ b/introduction/linearschedule/index.html @@ -7,7 +7,7 @@ Let’s go through the thought experiment what the free schedule demands from a physical system. In the free schedule animation, the propagation recurrences distributing the $A$ and $B$ matrix elements throughout the 3D lattice run ‘ahead’ of the actual computational recurrence calculating the $C$ matrix elements.">Linear Schedules - Domain Flow Architecture -

Linear Schedules

Linear Schedules

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/nextsteps/index.html b/introduction/nextsteps/index.html index 43dbb8a..3332a49 100644 --- a/introduction/nextsteps/index.html +++ b/introduction/nextsteps/index.html @@ -1,11 +1,11 @@ Next Steps - Domain Flow Architecture -

Next Steps

Now that we have a rudimentary understanding of parallel algorithms and their physical -execution, the next step is to learn about what makes for a fast and efficient parallel algorithm.

\ No newline at end of file diff --git a/introduction/parallel-programming/index.html b/introduction/parallel-programming/index.html index aa1a267..8054b11 100644 --- a/introduction/parallel-programming/index.html +++ b/introduction/parallel-programming/index.html @@ -1,5 +1,5 @@ Parallel Programming - Domain Flow Architecture -

Parallel Programming

To appreciate the domain flow programming model and what it enables, you need to think about the physical +

Parallel Programming

To appreciate the domain flow programming model and what it enables, you need to think about the physical form a ‘program evaluator’ could take. In the days when a processor occupied the volume of a small room, any physical computational machine was limited to a single computational element. This implied that the execution of any algorithm had to be specified as a complete order in time. @@ -19,12 +19,12 @@ machines mentioned above. Furthermore, the optimal algorithm even changes when the same machine architecture introduces a new, typically faster, implementation. And we are not just talking about simple algorithmic changes, such as loop order or blocking, sometimes even the underlying mathematics needs to change.

Given the complexity of writing parallel algorithms, this one-off nature of parallel algorithm design begged -the question: is there a parallel programming model that is invariant to the implementation technology of the machine?

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/spacetime/index.html b/introduction/spacetime/index.html index 1712291..5ce9ebd 100644 --- a/introduction/spacetime/index.html +++ b/introduction/spacetime/index.html @@ -1,5 +1,5 @@ Constraints of Spacetime - Domain Flow Architecture -

Constraints of Spacetime

If you visualize the ‘world’ from the perspective of an operand flowing through a machine, +

Constraints of Spacetime

If you visualize the ‘world’ from the perspective of an operand flowing through a machine, you realize that a physical machine creates a specific spatial constraint for the movement of data. Processing nodes are fixed in space, and information is exchanged between nodes to accomplish some transformation. Nodes consume and generate information, and communication links move information (program and data) between nodes. @@ -22,12 +22,12 @@ the propagation of information. A computational event has to be able to ‘see’ its operands before it can commence. Otherwise stated, its operands need to lie in the future light cone.

These temporal constraints are further complicated by the fact that man-made structures today do not communicate through free space yet, and the physical communication structure adds additional constraints -on the shape and extend of the future cone.

These man-made computational structures are dubbed computational spacetimes.

\ No newline at end of file + 
\ No newline at end of file diff --git a/introduction/wavefront/index.html b/introduction/wavefront/index.html index ad2f51c..a0f084b 100644 --- a/introduction/wavefront/index.html +++ b/introduction/wavefront/index.html @@ -1,5 +1,5 @@ Wavefronts of Computation - Domain Flow Architecture -

Wavefronts of Computation

Wavefronts of Computation

\ No newline at end of file + 
\ No newline at end of file diff --git a/linearsolvers/index.html b/linearsolvers/index.html index 96e7432..2e031f2 100644 --- a/linearsolvers/index.html +++ b/linearsolvers/index.html @@ -1,10 +1,10 @@ Linear Solvers - Domain Flow Architecture -

Linear Solvers

Solving systems of equations is the impetus for the class of algorithms called linear solvers.

\ No newline at end of file diff --git a/linearsolvers/lu/index.html b/linearsolvers/lu/index.html index 3824f8e..001076a 100644 --- a/linearsolvers/lu/index.html +++ b/linearsolvers/lu/index.html @@ -3,15 +3,15 @@ $$A = L \otimes U$$.">Gaussian Elimination - Domain Flow Architecture -

Gaussian Elimination

Gaussian Elimination, also known as $LU$ decomposition, decomposes a linear transformation +

Gaussian Elimination

Gaussian Elimination, also known as $LU$ decomposition, decomposes a linear transformation defined by the matrix $A$ into a lower-triangular matrix $L$, and an upper-triangular matrix $U$ -such that

$$A = L \otimes U$$.

\ No newline at end of file + 
\ No newline at end of file diff --git a/linearsolvers/solvers/index.html b/linearsolvers/solvers/index.html index 2ff772a..3517179 100644 --- a/linearsolvers/solvers/index.html +++ b/linearsolvers/solvers/index.html @@ -1,10 +1,10 @@ Linear Solvers - Domain Flow Architecture -

Linear Solvers

Linear solvers are algorithms designed to solve a linear system of equations.

\ No newline at end of file diff --git a/matrixkernels/index.html b/matrixkernels/index.html index 6e37172..fb76dc8 100644 --- a/matrixkernels/index.html +++ b/matrixkernels/index.html @@ -1,12 +1,12 @@ Matrix Kernels - Domain Flow Architecture -

Matrix Kernels

Matrix Kernels are important to characterize and classify the underlying system of equations. +

Matrix Kernels

Matrix Kernels are important to characterize and classify the underlying system of equations. Identifying singularity, and quantifying the null-space of a matrix are key operators -before we can try to solve systems of equations.

\ No newline at end of file + 
\ No newline at end of file diff --git a/matrixkernels/matrixkernels/index.html b/matrixkernels/matrixkernels/index.html index f7895f3..a2118e7 100644 --- a/matrixkernels/matrixkernels/index.html +++ b/matrixkernels/matrixkernels/index.html @@ -7,16 +7,16 @@ $L$ is the vector space of all elements $v$ of $V$ such that $L(v) = 0$, where 0 denotes the zero vector in $W, or more symbolically:">Matrix Kernels - Domain Flow Architecture -

Matrix Kernels

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace +

Matrix Kernels

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector. That is, given a linear map

$$L : V \rightarrow W$$ between two vector spaces $V$ and $W$, the kernel of

$L$ is the vector space of all elements $v$ of $V$ such that $L(v) = 0$, -where 0 denotes the zero vector in $W, or more symbolically:

$$ker(L) = \{ v \in V \hspace1ex | \hspace1ex L(v) = 0\} = L^{-1}(0)$$.

\ No newline at end of file + 
\ No newline at end of file diff --git a/search/index.html b/search/index.html index 3cfbe6f..3b0aa5d 100644 --- a/search/index.html +++ b/search/index.html @@ -1,11 +1,11 @@ Search - Domain Flow Architecture -

Search

-

\ No newline at end of file diff --git a/tags/algorithm/index.html b/tags/algorithm/index.html index fc6f661..68e30f8 100644 --- a/tags/algorithm/index.html +++ b/tags/algorithm/index.html @@ -1,10 +1,10 @@ Algorithm - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/computational-spacetime/index.html b/tags/computational-spacetime/index.html index 12aa259..0e827cb 100644 --- a/tags/computational-spacetime/index.html +++ b/tags/computational-spacetime/index.html @@ -1,10 +1,10 @@ Computational-Spacetime - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/conditioning/index.html b/tags/conditioning/index.html index 0bd1357..78d02e9 100644 --- a/tags/conditioning/index.html +++ b/tags/conditioning/index.html @@ -1,10 +1,10 @@ Conditioning - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/derivation/index.html b/tags/derivation/index.html index e2da080..5256aee 100644 --- a/tags/derivation/index.html +++ b/tags/derivation/index.html @@ -1,10 +1,10 @@ Derivation - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/domain-flow/index.html b/tags/domain-flow/index.html index 4d86466..7d6ba69 100644 --- a/tags/domain-flow/index.html +++ b/tags/domain-flow/index.html @@ -1,10 +1,10 @@ Domain-Flow - Tag - Domain Flow Architecture -

Tag - Domain-Flow

A

  • An Example

C

D

F

L

P

\ No newline at end of file diff --git a/tags/dsp/index.html b/tags/dsp/index.html index 01f5a36..151278c 100644 --- a/tags/dsp/index.html +++ b/tags/dsp/index.html @@ -1,10 +1,10 @@ Dsp - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/filtering/index.html b/tags/filtering/index.html index 17566a8..1092355 100644 --- a/tags/filtering/index.html +++ b/tags/filtering/index.html @@ -1,10 +1,10 @@ Filtering - Tag - Domain Flow Architecture -

Tag - Filtering

D

\ No newline at end of file diff --git a/tags/free-schedule/index.html b/tags/free-schedule/index.html index 806542f..090826c 100644 --- a/tags/free-schedule/index.html +++ b/tags/free-schedule/index.html @@ -1,10 +1,10 @@ Free-Schedule - Tag - Domain Flow Architecture -

Tag - Free-Schedule

F

\ No newline at end of file diff --git a/tags/identification/index.html b/tags/identification/index.html index 762103b..8de2cc8 100644 --- a/tags/identification/index.html +++ b/tags/identification/index.html @@ -1,10 +1,10 @@ Identification - Tag - Domain Flow Architecture -

Tag - Identification

I

\ No newline at end of file diff --git a/tags/index-space/index.html b/tags/index-space/index.html index ed266f2..5ae855e 100644 --- a/tags/index-space/index.html +++ b/tags/index-space/index.html @@ -1,10 +1,10 @@ Index-Space - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/index.html b/tags/index.html index f3ea3e1..53b29ce 100644 --- a/tags/index.html +++ b/tags/index.html @@ -1,10 +1,10 @@ Tags - Domain Flow Architecture - \ No newline at end of file diff --git a/tags/lattice/index.html b/tags/lattice/index.html index 0597bec..a182c85 100644 --- a/tags/lattice/index.html +++ b/tags/lattice/index.html @@ -1,10 +1,10 @@ Lattice - Tag - Domain Flow Architecture -
\ No newline at end of file diff --git a/tags/linear-schedule/index.html b/tags/linear-schedule/index.html index 6fef349..4f88ca8 100644 --- a/tags/linear-schedule/index.html +++ b/tags/linear-schedule/index.html @@ -1,10 +1,10 @@ Linear-Schedule - Tag - Domain Flow Architecture -

Tag - Linear-Schedule

L

\ No newline at end of file diff --git a/tags/matrix-multiply/index.html b/tags/matrix-multiply/index.html index a32d351..f032018 100644 --- a/tags/matrix-multiply/index.html +++ b/tags/matrix-multiply/index.html @@ -1,10 +1,10 @@ Matrix-Multiply - Tag - Domain Flow Architecture -

Tag - Matrix-Multiply

A

  • An Example

C

D

F

L

P

\ No newline at end of file diff --git a/tags/spectral-analysis/index.html b/tags/spectral-analysis/index.html index cfe60ba..a34106c 100644 --- a/tags/spectral-analysis/index.html +++ b/tags/spectral-analysis/index.html @@ -1,10 +1,10 @@ Spectral-Analysis - Tag - Domain Flow Architecture -

Tag - Spectral-Analysis

S

\ No newline at end of file diff --git a/tags/transform/index.html b/tags/transform/index.html index 10b4dd1..29e3f02 100644 --- a/tags/transform/index.html +++ b/tags/transform/index.html @@ -1,10 +1,10 @@ Transform - Tag - Domain Flow Architecture -

Tag - Transform

T

  • Transforms
\ No newline at end of file