Improve "underlying platform" references in spec #462

inexorabletash · 2023-09-08T00:22:06Z

The spec currently uses algorithms of the form (just examples copy/pasted from different places

Make a request to the underlying platform to:
1. Let opImpl be an implementation-defined platform operator for the slice operation, given starts and sizes.
2. ...
3. Create an implementation-defined platform operand outputImpl to represent the output, given output and opImpl.
4. ...
Connect input.[[operand]] as input to opImpl.
Connect output.[[operand]] as output to opImpl.

"implementation-defined" isn't really helpful here, since it never explains what this does.
"platform operator" is never defined
"the slice operation" isn't defined
"connect" isn't defined

Ideally this would all be defined. Here's one way to approach it, imagining a much simpler interface:

First we define the abstract notion of an op:

An op represents a mathematical operation as part of a graph. An op has zero or more inputs (ops) and one output (an op), and an algorithm which, when invoked, takes a corresponding number of input values and produces an output value.

In a dedicated section of the spec, we'd rigorously define each op's algorithm:

An addition op is an op with two inputs (both numbers) and the following algorithm:

Let a be the first input value.
Let b be the second input value.
Return a + b.

An average op is an op with one input (a list) and the following algorithm:

Let a be the input.
Let size be a's size.
Assert: size is greater than 0.
Let sum be 0.
For each value of a:
1. Let sum be sum + value.
Return sum / size.

Obviously these are overly simple, but in the spec right now the more complex ops are currently only defined by links. I'd tackle this incrementally, but it should be in the spec itself! See Web Audio for examples e.g. https://webaudio.github.io/web-audio-api/#fft-windowing-and-smoothing-over-time. These could be inline with the builder methods or as a separate section of the spec.

And then in a builder method for add() it'd be something like:

Let output be the result of creating an MLOperand given this and desc.
Let opImpl be a new addition op.
Set opImpl's inputs to inputs.
Set opImpl's outputs to output.[[operand]].
Set output.[[operator]] to opImpl.

That's probably not quite correct but hopefully the intention is clear.

More notes:

In the above I'm mixing the [[internal slot]] notation with the "a has a b" notation. Sorry about that.
Execution for a graph can be defined with an algorithm now - walking over the graph and passing input values/output values around.
The input/output hookup is very similar for most builder methods, so maybe that can be factored out into a common algorithm.
The current builder methods all have the common form of (1) validate inputs (2) construct and hook up the operator. The second part is always inside a clause like: "If any of the following sub-steps fail, throw an "OperationError" DOMException." - but it's not clear to me what is expected to fail. If implementations will actually be calling into native APIs at the builder step and that's expected to fail, we should define the reasons - e.g. is it memory, is it that the platform might not support an op, etc. That could be specified like:

Let opImpl be a new addition op. If this fails for implementation-defined reasons, throw an "OperationError" DOMException.

Okay, that's a lot of rambling and as noted I may be missing important details, so I'll stop here. Let's iterate (here or a PR?) until we come up with a good approach that balances readability, maintainability, and precision.

cc: @huningxin

The text was updated successfully, but these errors were encountered:

zolkis · 2023-09-08T12:10:02Z

Thanks @inexorabletash for formalizing these suggestions. I've been also thinking about what can we factor out from the quite repetitive texts in the algorithms. The intent was to first finish the first approach to the algorithms separately for each op, and then do exactly what you are suggesting now. It's good time for it.

The challenge is that in most algorithms we have similarly looking prose, but the underlying implementation can be quite different from one case to the other. It might be misleading to unify at syntactic level, and miss the semantic differences.

A first target to formalize could be how to express in a common formalized way the sequence specified for ops. For operators, it could be reduced by defining the create MLActivation algorithm, then refer to it from e.g. the elu() steps:

Let op be the result of creating an MLActivation given this, "elu" and options.

Return op.

Whereas for the ops, it refers to the underlying platform specific to the elu() implementation. There the question is how to parametrize these steps given the op_name , while binding the impl's op-specific internal slot for operator and output and create a "macro", meta-algorithm for that. It should be doable, and we can attempt it in a way similar you described above.

I was thinking about defining a map for the op algorithm name --> impl internal slot, and use that from the prose. But it would be quite verbose and futile: it should also be possible to just say that in prose, inputting the op_alg_name as a macro-input (expressed with prose).

huningxin · 2023-09-11T00:56:15Z

@inexorabletash , thanks for initiating this thread! I'd agree this is an area needs to be improved.

In the current spec, the "underlying platform" / "platform op" is a kind of abstraction from native APIs used in implementation, such as XNNPACK and DirectML. We assume the "platform op" would hold the WebNN op semantics, as you mentioned, usually defined by links rather than defined by algorithm steps. For some cases, it would cause ambiguity, for example, the issue 358: Please clarify interpolation algorithm for resample2d.

IIUC, the op abstraction sounds like to implement WebNN op without relying on any (native) ML APIs. I'd agree it would make the spec to be more precise and "self-contained". It reminds me the webnn-baseline project which is a pure JavaScript implementation for WPT test data generation. It doesn't rely on any ML libraries (e.g., TensorFlow.js, BTW, which is used by webnn-polyfill for performance reason) either. All ops have a JavaScript implementation, for example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/binary.js.

function binary(inputA, inputB, binaryFunc) {
  const outputShape = getBroadcastShape(inputA.shape, inputB.shape);
  const inputABroadcast = broadcast(inputA, outputShape);
  const inputBBroadcast = broadcast(inputB, outputShape);
  const outputSize = sizeOfShape(outputShape);
  const output = new Tensor(outputShape);
  for (let i = 0; i < outputSize; ++i) {
    const a = inputABroadcast.getValueByIndex(i);
    const b = inputBBroadcast.getValueByIndex(i);
    const c = binaryFunc(a, b);
    output.setValueByIndex(i, c);
  }
  return output;
}

This would add a few more things into your addition op algorithm to make it element-wise addition, some initial thoughts:

We may need to support tensor (multi-dimensional array with a uniform type). The element-wise addition op should operate on tensor instead of numbers. In current spec, Operand interface already has the dimensions and data type in the [[descriptor]] internal slot, we may extend it with an [[array]] internal slot that holds the actual tensor data. With that, the implementation-defined platform [[operand]] internal slot won't be required anymore. We may also need to define additional algorithms for accessing the tensor data, such as getValueByIndex, setValueByIndex, getValueByLocation and setValueByLocation used in above element-wise binary and other op implementations. For example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/lib/tensor.js
We may need to define the getBroadcastShape and broadcast algorithms instead of referring to NumPy doc. For example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/lib/broadcast.js
The binaryFunc is a mathematical operation working on numbers, it could invoke addition op algorithm.
The algorithm of "high-level" ops may be written by composing the algorithms of "low-level" ops, for example: https://github.com/webmachinelearning/webnn-baseline/blob/main/src/batch_normalization.js. That would also align with the pseudo decomposing code in the spec, such as for batchNormalization

Loop in @wchao1115 for comments.

inexorabletash · 2023-09-11T16:41:46Z

IIUC, the op abstraction sounds like to implement WebNN op without relying on any (native) ML APIs.

Correct. We would expect that real implementations would optimize heavily using platform APIs, parallel instructions, hardware accelerators, etc etc. Using the JS code to help craft the algorithm implementations seems like a great idea!

This would add a few more things into your addition op algorithm to make it element-wise addition,

To be clear, I was really just showing the style I'd follow rather than giving a concrete example for inclusion in the spec. Tensors etc. definitely need to be used rather than just primitive inputs.

Speaking of style, I wrote:

Let a be the first input value.

Let b be the second input value.

But we could use a style introducing the algorithm to simplify this, e.g. "takes two inputs (a and b, both numbers)".

This change introduces a new section for Algorithms, following APIs, to collect algorithms referenced throughout the specification. A section for Broadcasting is introduced, which defines broadcasting shapes and gives an explicit algorithm matching WebNN implementations of NumPy's General Broadcasting Rules. Definitions for "broadcastable" and "unidirectionally broadcastable" are introduced. The previous definition of "broadcast-shapes" is removed in favor of these new algorithms. For webmachinelearning#324, webmachinelearning#462, and potentially webmachinelearning#523.

This change introduces a new section for Algorithms, following APIs, to collect algorithms referenced throughout the specification. A section for Broadcasting is introduced, which defines broadcasting shapes and gives an explicit algorithm matching WebNN implementations of NumPy's General Broadcasting Rules. Definitions for "broadcastable" and "unidirectionally broadcastable" are introduced. The previous definition of "broadcast-shapes" is removed in favor of these new algorithms. For webmachinelearning#324, webmachinelearning#378, webmachinelearning#462, and potentially webmachinelearning#523.

The default linking for [=object=] is to FileAPI's blob URL entry object member[1], which is definitely wrong. Replace such references with: * Infra's ordered map[2], when a dictionary is intended, as that's what a JavaScript object is translated into via bindings. * Links to new definitions for "platform operator" and "platform operand" which include the "implementation-defined" phrase so links to the terms can be greatly simplified. These definitions should be further expanded to include "connect", "input" and "output" but that can be tackled later. This reduces the number of "implementation-defined" phrases to just a handful in the spec, which helps with webmachinelearning#462. [1] https://www.w3.org/TR/FileAPI/#blob-url-entry-object [2] https://infra.spec.whatwg.org/#ordered-map

The default linking for [=object=] is to FileAPI's blob URL entry object member[1], which is definitely wrong. Replace such references with: * Infra's ordered map[2], when a dictionary is intended, as that's what a JavaScript object is translated into via bindings. * Links to new definitions for "platform operator" and "platform operand" which include the "implementation-defined" phrase so links to the terms can be greatly simplified. These definitions should be further expanded to include "connect", "input" and "output" but that can be tackled later. This reduces the number of "implementation-defined" phrases to just a handful in the spec, which helps with webmachinelearning#462. [1] https://www.w3.org/TR/FileAPI/#blob-url-entry-object [2] https://infra.spec.whatwg.org/#ordered-map Co-authored-by: Anssi Kostiainen <[email protected]>

The default linking for [=object=] is to FileAPI's blob URL entry object member[1], which is definitely wrong. Replace such references with: * Infra's ordered map[2], when a dictionary is intended, as that's what a JavaScript object is translated into via bindings. * Links to new definitions for "platform operator" and "platform operand" which include the "implementation-defined" phrase so links to the terms can be greatly simplified. These definitions should be further expanded to include "connect", "input" and "output" but that can be tackled later. This reduces the number of "implementation-defined" phrases to just a handful in the spec, which helps with webmachinelearning#462. [1] https://www.w3.org/TR/FileAPI/#blob-url-entry-object [2] https://infra.spec.whatwg.org/#ordered-map Co-authored-by: Anssi Kostiainen <[email protected]> Co-authored-by: Ningxin Hu <[email protected]>

This change introduces a new section for Algorithms, following APIs, to collect algorithms referenced throughout the specification. A section for Broadcasting is introduced, which defines broadcasting shapes and gives an explicit algorithm matching WebNN implementations of NumPy's General Broadcasting Rules. Definitions for "broadcastable" and "unidirectionally broadcastable" are introduced. The previous definition of "broadcast-shapes" is removed in favor of these new algorithms. For webmachinelearning#324, webmachinelearning#378, webmachinelearning#462, and potentially webmachinelearning#523.

This change introduces a new section for Algorithms, following APIs, to collect algorithms referenced throughout the specification. A section for Broadcasting is introduced, which defines broadcasting shapes and gives an explicit algorithm matching WebNN implementations of NumPy's General Broadcasting Rules. Definitions for "broadcastable" and "unidirectionally broadcastable" are introduced. The previous definition of "broadcast-shapes" is removed in favor of these new algorithms. Use broadcasting definition in expand(), rather than bespoke steps For webmachinelearning#324, webmachinelearning#378, webmachinelearning#462, and potentially webmachinelearning#523. Co-authored-by: Dwayne Robinson <[email protected]>

* New content: Add definition for shape broadcasting This change introduces a new section for Algorithms, following APIs, to collect algorithms referenced throughout the specification. A section for Broadcasting is introduced, which defines broadcasting shapes and gives an explicit algorithm matching WebNN implementations of NumPy's General Broadcasting Rules. Definitions for "broadcastable" and "unidirectionally broadcastable" are introduced. The previous definition of "broadcast-shapes" is removed in favor of these new algorithms. Use broadcasting definition in expand(), rather than bespoke steps For #324, #378, #462, and potentially #523. Co-authored-by: Dwayne Robinson <[email protected]> * Fix prelu parameter order --------- Co-authored-by: Dwayne Robinson <[email protected]>

inexorabletash · 2024-07-11T17:34:25Z

Restating this issue:

The processing model for graph execution should be made more rigorous. At an extreme, it would look like pseudocode for an implementation of compute().
The behavior of each operator should be explained rigorously; even for basic math ops there are cases to consider around overflow/underflow. I don't think we want to go as far as pseudocode for each, but in some cases that might be appropriate. Some example issues that fall into this second category:

And commentary: although spec purity is great, I don't think this is particularly high priority; tackling the second category of issues on a case-by-case basis as needed to resolve interop issues seems fine.

inexorabletash mentioned this issue Jan 17, 2024

Define algorithms for dictionaries with lists as default values #211

Closed

a-sully mentioned this issue Jan 25, 2024

Add samples of shape broadcasting #378

Closed

inexorabletash mentioned this issue Jan 26, 2024

New content: Add definition for shape broadcasting #534

Merged

inexorabletash added the conventions label Jan 29, 2024

inexorabletash mentioned this issue Jan 30, 2024

Wording change: drop links to File API's "object", define platform operand/operator #540

Merged

inexorabletash mentioned this issue Feb 5, 2024

Deprecated squeeze with emulation path is missing its own section #551

Closed

inexorabletash mentioned this issue Feb 17, 2024

Use web spec best practices #450

Closed

a-sully mentioned this issue Aug 13, 2024

How to define the algorithm of L2_Pool2d? #278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve "underlying platform" references in spec #462

Improve "underlying platform" references in spec #462

inexorabletash commented Sep 8, 2023 •

edited

Loading

zolkis commented Sep 8, 2023

huningxin commented Sep 11, 2023

inexorabletash commented Sep 11, 2023

inexorabletash commented Jul 11, 2024

Improve "underlying platform" references in spec #462

Improve "underlying platform" references in spec #462

Comments

inexorabletash commented Sep 8, 2023 • edited Loading

zolkis commented Sep 8, 2023

huningxin commented Sep 11, 2023

inexorabletash commented Sep 11, 2023

inexorabletash commented Jul 11, 2024

inexorabletash commented Sep 8, 2023 •

edited

Loading