Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLBuffer] Creation and representing MLBuffer on a XPU devices #542

Open
bbernhar opened this issue Jan 30, 2024 · 27 comments
Open

[MLBuffer] Creation and representing MLBuffer on a XPU devices #542

bbernhar opened this issue Jan 30, 2024 · 27 comments

Comments

@bbernhar
Copy link

bbernhar commented Jan 30, 2024

Purpose/Motivation

Defines a device-based storage object that may be used by WebNN operations. This is a sub-issue of #482.

Proposed API

typedef unsigned long MLFlagsConstant;

[Exposed=(Window, DedicatedWorker)]
interface MLBuffer {
  readonly attribute MLFlagsConstant usage;
  readonly attribute MLOperandDescriptor descriptor;
  [CallWith=Isolate] void destroy();
}
[Exposed=(Window, DedicatedWorker), SecureContext]
namespace MLBufferUsage {
    // TBD
};

[Exposed=(Window, DedicatedWorker), SecureContext]
partial interface MLContext {
    Promise<MLBuffer> createBuffer(MLOperandDescriptor descriptor, MLBufferUsage usages);
};

Example JS

const ml_buffer = await mlContext.createBuffer(descriptor, usages);
ml_buffer.Destroy(); // invalid
  • The buffer's allocation will be zeroed (as it is for WebGPU's createBuffer() method)
  • Layout of MLBuffer is always known (and linear access is assumed).
  • Destroy() gets called on the context timeline but doesn't actually release until the device signals completion.

Edits

  • 5/14/24 - Removed "size" in favor or using MLOperandDescriptor
  • 6/03/24 - Added usage flags and descriptor attributes
  • 7/09/24 - createBuffer() now returns promise.

Alternative API proposals

N/A

Opens

  1. Where will an MLBuffer's memory be allocated on systems where an MLContext may not be as closely tied to a given physical device as an IDMLDevice? See Need to understand how WebNN supports implementation that involves multiple devices and timelines #350 @a-sully
  2. Must an MLBuffer only be used with an MLContext it was created from? @a-sully
  3. Can an MLBuffer's size always be known at the time of buffer allocation? @a-sully
  4. When will MLBuffer be deallocated if destroy() is not called? @a-sully
  5. Does MLBuffer require explicit buffer usages (ex. input, output, or both)? @bbernhar
  6. Does MLBuffer need to support being a staging buffer? @bbernhar
  7. Is a zero sized MLBuffer allowed? @bbernhar
@bbernhar bbernhar changed the title [MLBuffer] Creation and representing MLBuffer on a XPU devices (ie. MLContext.createBuffer) [MLBuffer] Creation and representing MLBuffer on a XPU devices Jan 30, 2024
@a-sully
Copy link
Contributor

a-sully commented Apr 17, 2024

My recent investigation into supporting MLBuffer on CoreML has lead me to the following two suggestions for createBuffer():

1. We need a WebGPU usage flag (at minimum)

The only zero-copy way to pass a buffer to both WebGPU (as an IOSurface) and CoreML (as an MLMultiArray) is to first allocate the buffer as an IOSurface containing "float16" data (IOSurface -> CVPixelBuffer -> MLMultiArray)

If the MLBuffer is to be used with WebGPU it must be allocated in this fashion (to be zero-copy, at least), whereas an MLBuffer which is only used within WebNN may be allocated as an MLMultiArray directly (more on that below)

2. MLBufferDescriptor should include an MLOperandDescriptor rather than an MLSize64

CoreML's inputs and outputs are given as MLMultiArrays, which require the data type and dimensions to be known. If we're to allocate a hardware buffer for createBuffer(), this information must be known.

Given that the dimensions + data type of input and output operands to an MLGraph are well-defined anyways, it seems reasonable to enforce that an MLBuffer must have matching constraints to be passed as an input or output to an MLGraph as #544 describes? Is there a reason why we should keep MLSize64?

@bbernhar
Copy link
Author

Thanks @a-sully for delving into the CoreML side of things.

Regarding the need for a WebGPU usage flag:

Is it feasible for an MLBuffer to always be created as an MLMultiArray where, upon import to WebGPU, we could assign or request the usages? Assigning GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC appears to be sufficient.

As for the question about keeping MLSize64:

Without MLSize64, any ML framework that doesn't represent its tensor datatype like MLMultiArray would require re-architecting to avoid creating (especially output) tensors from raw allocations (or malloc). Alternatively, the web developer would need to defer calling createBuffer() until dispatch(), impacting the first inference-time. Could MLOperandDescriptor be made optional instead? The size could then be ignored where irrelevant.

@a-sully
Copy link
Contributor

a-sully commented Apr 18, 2024

Is it feasible for an MLBuffer to always be created as an MLMultiArray where, upon import to WebGPU, we could assign or request the usages?

AFAICT an MLMultiArray cannot be handed off to WebGPU. It's a data type specific to CoreML. Importing to a type WebGPU can understand would always require a copy - even on UMA systems, which would be unfortunate!

Without MLSize64, any ML framework that doesn't represent its tensor datatype like MLMultiArray would require re-architecting to avoid creating (especially output) tensors from raw allocations (or malloc). Alternatively, the web developer would need to defer calling createBuffer() until dispatch(), impacting the first inference-time.

Hmm I'm not sure if I understand your concern.... Implementations are still welcome to allocate an MLBuffer as one contiguous block of memory. It's just that the WebNN "front end" would assert (when you call dispatch()) that the dtype and dimensions of the passed-in MLBuffer match what the graph expects, rather than just that the sizes are the same. Concretely, that maps to these checks in your prototype CL.

Is the use case you're referring to one where a single MLBuffer is assumed to be able to be contort into different dtype and dimensions? For example:

const mlBuffer = new MLBuffer({size:3*4*4});

// `graph1` expects a float32 output with shape [3, 4]
context.dispatch(graph1, inputs, {'out': mlBuffer});

// `graph2` expects a float16 input with shape [4, 3, 2]
context.dispatch(graph2, {'in': mlBuffer}, outputs);

This hides an internal reinterpretation of the data type and dimensions of what's assumed to be an opaque bag of bytes. I think there's a reasonable argument that this implementation detail should not make its way into the WebNN spec, which shouldn't prescribe a particular implementation.

WebNN has reshape and cast operators. In the example above, graph2 may use these operators to convert an input into whatever dtype and dimensions it needs, if it still wants to be able to use mlBuffer. An advantage of this approach is that the otherwise opaque reinterpretation of the buffer can be expressed in terms of other well-defined operators.

Could you elaborate on the use case(s) you have in mind?

Could MLOperandDescriptor be made optional instead? The size could then be ignored where irrelevant.

What would be the expected behavior on platforms which require a data type and dimensions when the buffer is allocated? An MLOperandDescriptor implies a size - but not the other way around.

@bbernhar
Copy link
Author

bbernhar commented Apr 18, 2024

AFAICT an MLMultiArray cannot be handed off to WebGPU.

I was expecting we start the allocation in CoreML via MLBuffer then import it as a MTLBuffer into WebGPU, using the GPUBuffer usages I mentioned.

Implementations are still welcome to allocate an MLBuffer as one contiguous block of memory

Consider a native C++ framework which implements a Tensor dtype as a bag of bytes. If you want to deploy this ML framework using WebNN JS API as an execution provider (or EP), it expects buffers will be allocated using a size. If we force createBuffer() to accept only a MLOperandDescriptor then this EP couldn't simply map Tensor allocation to createBuffer(). They would need to come up with a solution just for MLBuffer, preserving MLOperandDescriptor, or defer createBuffer(), which seems either burdensome or ineffective.

@a-sully
Copy link
Contributor

a-sully commented Apr 19, 2024

AFAICT an MLMultiArray cannot be handed off to WebGPU.

I was expecting we start the allocation in CoreML via MLBuffer then import it as a MTLBuffer into WebGPU, using the GPUBuffer usages I mentioned.

Ah, I think my wording of "We need a WebGPU usage flag" above was misleading. I'm not suggesting that we need WebGPU usage flags here, but rather a usage flag saying "I want this MLBuffer to convertible to a GPUBuffer" (because the implementation may use that information to determine where/how the buffer should be allocated). Does that clear things up?

Could you also clarify what exactly you mean by "start the allocation in CoreML"? I assume you mean "as an MLMultiArray", but that would require the dtype and dimensions to be known, no?

Implementations are still welcome to allocate an MLBuffer as one contiguous block of memory

Consider a native C++ framework which implements a Tensor dtype as a bag of bytes. If you want to deploy this ML framework using WebNN JS API as an execution provider (or EP), it expects buffers will be allocated using a size. If we force createBuffer() to accept only a MLOperandDescriptor then this EP couldn't simply map Tensor allocation to createBuffer(). They would need to come up with a solution just for MLBuffer, preserving MLOperandDescriptor, or defer createBuffer(), which seems either burdensome or ineffective.

Thanks for the explanation. Could you provide a concrete example of where this concern is relevant? A quick glance at some common ML frameworks suggests that just size is often not sufficient to allocate a tensor. OnnxRuntime's JavaScript Tensor requires dtype and dimensions, for example. As does TFLite’s equivalent. Are there known examples where a size is available but not the dtype and dimensions? Presumably the MLBuffer is being allocated with use by some given MLGraph in mind, and the data types and dimensions of inputs and outputs must already be known? (input() and build() (for outputs) each require an MLOperandDescriptor)

Another consideration is that size may not be enough regardless of whether we want to replace size with an MLOperandDescriptor. As mentioned above, I expect we'll need usage flags, too. Does your concern still hold if arguments other than size become required?

@bbernhar
Copy link
Author

Could you also clarify what exactly you mean by "start the allocation in CoreML"?

Could we pass a union to createBuffer() which specifies either the size or MLOperandDescriptor so MLBuffer could be always created as MLMultiArray? If not, another (possible) alt. solution is have createBuffer(size) defer creation of MLMultiArray until dispatch().

Are there known examples where a size is available but not the dtype and dimensions?

Yes, the ORT web tensor dtype can only be implemented behind a "malloc" like C inference. When WebNN is used as a EP, it exists within the ML runtime itself.

@reillyeon
Copy link
Contributor

Yes, the ORT web tensor dtype can only be implemented behind a "malloc" like C inference.

I don't understand this comment because all of the Tensor constructors in that header take shape information. What am I missing?

@bbernhar
Copy link
Author

I don't understand this comment because all of the Tensor constructors in that header take shape information. What am I missing?

Notice the Tensor constructor uses a IAllocator interface. That's the only way MLBuffer can be created from because it must own the buffer for the specified shape. Funny enough, the shape information is right there but the main point is ORT expects its possible to whip up a device buffer given only a size.

@a-sully
Copy link
Contributor

a-sully commented Apr 19, 2024

Taking a step back, the Web Platform Design Principles implore us to "design based on user needs, not the underlying API or hardware":

This means newly proposed APIs should be designed with careful consideration on how they are intended to be used rather than how the underlying hardware, device, or native API available today.

The use cases for an MLBuffer - using some (hardware-optimized) buffer as an input or output to an ML graph - all require that the data type and dimensions of the buffer be known. We should not prescribe implementation details, such at that the buffer must be allocated contiguously, as this other design principle cautions:

Be particularly careful about exposing the exact lifecycle and data structures of the underlying native APIs. When possible, consider flexibility for new hardware.

The point about considering flexibility for new hardware is especially pertinent to WebNN :)

While I understand the desire to design a web platform API which (especially WASM) user-space frameworks can easily plug into, the web platform API should not bend over backwards to accommodate the implementation choices of any given framework. And the web platform API certainly should not bake in assumptions based on the current limitations of said frameworks! In this case, ORT does not support CoreML in cases where an MLMultiArray used as an output is not contiguously allocated. It seems likely that addressing that limitation would require changes to ORT which are ~the same as what would be needed to support MLBuffer if creating an MLBuffer required a dtype and dimensions?

@RafaelCintron
Copy link
Collaborator

[@a-sully wrote]

The only zero-copy way to pass a buffer to both WebGPU (as an IOSurface) and CoreML (as an MLMultiArray) is to first allocate the buffer as an IOSurface containing "float16" data (IOSurface -> CVPixelBuffer -> MLMultiArray)

For Apple platforms, my understanding is you can go from MLMultiArray -> MTLBuffer by calling getBytesWithHandler + newBufferWithBytesNoCopy. With an MTLBuffer you should be able to create a WebGPU buffer.

Why are IOSurfaces be required?

@huningxin
Copy link
Contributor

another (possible) alt. solution is have createBuffer(size) defer creation of MLMultiArray until dispatch().

Seems doable. writeBuffer() may hold the BigBuffer with user data, then at dispatch(), create an MLMultiArray by initWithDataPointer?

@a-sully
Copy link
Contributor

a-sully commented Apr 20, 2024

For Apple platforms, my understanding is you can go from MLMultiArray -> MTLBuffer by calling getBytesWithHandler + newBufferWithBytesNoCopy. With an MTLBuffer you should be able to create a WebGPU buffer.

Good question! I originally thought so too, but my current understanding is that this is not generically true (i.e. for all data types). If anyone can definitively confirm or dispute this understanding (@mwyrzykowski?) please speak up! Alright here goes...

The docs of newBufferWithBytesNoCopy say that it:

Creates a buffer that wraps an existing contiguous memory allocation

whereas the docs for getBytesWithHandler say of the buffer:

It may not store these scalar values contiguously

so I would assume that this would not be allowed (or at least not be zero-copy) unless the MLMultiArray was specifically allocated contiguously.

How can we ensure an MLMultiArray is allocated contiguously?

Of all the MLMultiArray constructors, the candidates for ensuring a contiguous memory allocation seem to be:

The first one looks promising! Unfortunately it seems - based on past offline discussions - that CoreML internally makes a copy of the bytes when using this constructor. That strides is a parameter seems to corroborate this.

So this would not be zero-copy:

another (possible) alt. solution is have createBuffer(size) defer creation of MLMultiArray until dispatch().

Seems doable. writeBuffer() may hold the BigBuffer with user data, then at dispatch(), create an MLMultiArray by initWithDataPointer?

The latter constructor takes a CVPixelBuffer, but this only works if the CVPixelBuffer is a "float16" IOSurface in disguise:

Use this initializer to create an IOSurface-backed MLMultiArray that reduces the inference latency by avoiding the buffer copy to and from some compute units.

The pixel buffer’s pixel format type must be kCVPixelFormatType_OneComponent16Half. The MLMultiArray data type is MLMultiArrayDataType.float16.

So eith regards to this question....

Why are IOSurfaces be required?

It seems that the only way to avoid copies of a backing memory which is to be shared as both an MLMultiArray and an MTLBuffer is to start with a float16 IOSurface. Unfortunately this suggests that zero-copy buffer sharing is only possible under certain dtype + "do we need to share with WebGPU" configurations. Of course, if we know the memory will stay within CoreML (i.e. it doesn't need to be shared with WebGPU) then we can allocate an MLMultiArray directly, though this would require dtype and shape to be known before writeBuffer()

Data Type WebNN Use Only WebGPU Interop
float16 ✅ Zero copy (as MLMultiArray or IOSurface) ✅ Zero copy (as IOSurface)
float32 ✅ Zero copy (as MLMultiArray) 🌕 Data copies (with initWithDataPointer)
float64 ✅ Zero copy (as MLMultiArray) 🌕 Data copies (with initWithDataPointer)
int32 ✅ Zero copy (as MLMultiArray) 🌕 Data copies (with initWithDataPointer)
other ❓ May be emulated as int32? ❓ Not sure

@mwyrzykowski
Copy link

For Apple platforms, my understanding is you can go from MLMultiArray -> MTLBuffer by calling getBytesWithHandler + newBufferWithBytesNoCopy. With an MTLBuffer you should be able to create a WebGPU buffer.

Good question! I originally thought so too, but my current understanding is that this is not generically true (i.e. for all data types). If anyone can definitively confirm or dispute this understanding (@mwyrzykowski?) please speak up! Alright here goes...

It is zero copy in CoreML but anything other than fp16 + CVPixelBuffer will result in a copy below CoreML

@bbernhar
Copy link
Author

web platform API certainly should not bake in assumptions based on the current limitations of said frameworks!

Not all HW APIs require a MLOperandDescriptor for buffer creation, not specific to ORT (ex. DML). If the ML framework wants to pre-allocate buckets of memory but WebNN cannot (aka GPUBuffer), that's equally an assumption on WebNN's behalf IMO.

Unless MLMultiArray can NOT be implemented through a MLBuffer, it seems unnecessary to require only a MLOperandDescriptor.

@bbernhar
Copy link
Author

bbernhar commented May 9, 2024

@a-sully Thinking of a way forward to unblock CoreML.

Here's the options I've gathered:

  1. Use MLBuffer(MLOperandDescriptor) and workaround the problem in ORT by calling createBuffer() in dispatch().
  2. Re-implement MLBuffer API to be typed like MLMultiArray, WebNN RT provides an IAllocator impl.
  3. Keep MLBuffer and have the CoreML impl. cache MLMultiArray(s) upon dispatch().

I am not a fan of (1) because it bakes assumptions into the WebNN spec (ex. ORT never pre-allocates or uses untyped buffers). Untyped buffers (aka byte buffers with a linear layout) for example, could be partially dispatched via a MLBufferView, re-used between multiple calls to dispatch(), or pre-allocated from a larger MLBuffer using createBuffer(size).

The other option (2), means WebNN backends (ex. DML resources) must be re-implemented to work like MLMultiArray (which requires strides to read and write), which is a considerable effort/burden. If (3) is possible, it seems like the simplest path forward, did you have a chance to investigate this?

@a-sully
Copy link
Contributor

a-sully commented May 9, 2024

Thanks for the input @bbernhar. I've been exploring this space more, and I still believe the path forward if we want "a device-based storage object that may be used by WebNN operations" is the following:

4. Use MLBuffer(MLOperandDescriptor, MLBufferUsageFlags) and frameworks which use WebNN should not assume implementation details, such as that tensors will always be contiguously allocated

Responses inline:


web platform API certainly should not bake in assumptions based on the current limitations of said frameworks!

Not all HW APIs require a MLOperandDescriptor for buffer creation, not specific to ORT (ex. DML). If the ML framework wants to pre-allocate buckets of memory but WebNN cannot (aka GPUBuffer), that's equally an assumption on WebNN's behalf IMO.

Hmm I'm not following here. The question is not whether HW APIs need an MLOperandDescriptor, but whether HW APIs can support the contract specified by MLBuffer.

If an ML framework wants to allocate a GPUBuffer, how is that relevant to WebNN? Could you please elaborate on this point?

I am not a fan of (1) because it bakes assumptions into the WebNN spec (ex. ORT never pre-allocates or uses untyped buffers). Untyped buffers (aka byte buffers with a linear layout) for example, could be partially dispatched via a MLBufferView, re-used between multiple calls to dispatch(), or pre-allocated from a larger MLBuffer using createBuffer(size).

Please refer back to #542 (comment). The WebNN spec should not prescribe implementation details, such at that the buffer must be allocated contiguously. This violates the design principles here: https://w3ctag.github.io/design-principles/#usecase-oriented-apis

The other option (2), means WebNN backends (ex. DML resources) must be re-implemented to work like MLMultiArray (which requires strides to read and write), which is a considerable effort/burden.

I don't understand this suggestion. MLOperandDescriptor does not include strides - just dtype and shape. And this shape does not imply there must be strides; how/where an MLBuffer is allocated is entirely an implementation detail. If an MLBuffer were to be created with an MLOperandDescriptor, presumably the user agent's DML backend could calculate the total byte size and allocate a contiguous array as it currently does. The only thing that would change in the user agent implementation is a check that an MLBuffer's MLOperandDescriptor matches the MLOperandDescriptor expected by the input and output operands (in the Chromium implementation, this would be a platform-agnostic check that happens in the renderer anyways).

If (3) is possible, it seems like the simplest path forward, did you have a chance to investigate this?

This is not possible without several data copies (e.g. where does the data go when writeBuffer() is called?). This also falls apart if MLBuffer is not type-safe and can be assumed to be recasted/reshaped to any dtype and shape: #542 (comment)

@bbernhar
Copy link
Author

bbernhar commented May 9, 2024

If an ML framework wants to allocate a GPUBuffer, how is that relevant to WebNN? Could you please elaborate on this point?

The developer has to know the layout in order to calculate offsets which split-up and re-use a larger buffer piece-meal. Note: a linear layout does not dictate how MLBuffer gets implemented, it could actually be non-contiguous. In WebGPU, GPUBuffer layout is known (and linear) so web developers can implement IAllocator on-top of GPUBuffer. If we don't allow createBuffer(size), then that problem gets punted into the WebNN runtime. If the DML backend called CreateCommitedResource() every call to createBuffer(), our first-inference performance would be awful, which is why compute() implements its own IAllocator already. But since MLBuffer are pre-allocated before build(), we can't just FIFO it and be done with it.

This is not possible without several data copies

Bummer. The more I think about it, the more likely MLBuffer needs to behave like MLTensor. DML can emulate MTLMultiArray ops but not vise-versa.

@a-sully
Copy link
Contributor

a-sully commented May 9, 2024

The more I think about it, the more likely MLBuffer needs to behave like MLTensor

Ah yes, this is what I've been advocating for but without using that specific vocabulary 😛

@bbernhar
Copy link
Author

@a-sully

If the layout of MLBuffer will be unknown, we also need to specify a way for the web developer to initialize tensor data, as readBuffer() and writeBuffer() assumed the layout was linear. For zero-copy, it seems MLBuffer must index into a MTLMultiArray since createBuffer(MLOperandDescriptor) wouldn't accept an ArrayBufferView.

Could you help me understand the plan there?

@a-sully
Copy link
Contributor

a-sully commented May 10, 2024

Hmmm I thought it was a given (based on my earlier comments here) that readBuffer() and writeBuffer() would not be zero-copy. A closer look at the CoreML API has convinced me that guaranteed zero-copy buffer-mapping from JS is not possible (since again, initWithDataPointer would still result in copies) - and as I stated in that earlier comment, I don't think this is too big of a deal, at least for inputs and outputs (constants may be a different story)

My claim - if we assume that readBuffer() and writeBuffer() will have copies - is that the web platform layer should always be able to provide the caller the illusion of linear memory, even if it's not linear under the hood. The MLMultiArray's subscript(_:) method provides this abstraction this, for example. Do you see any issues with this approach?

@bbernhar
Copy link
Author

Do you see any issues with this approach?

Nope, the proposed change SGTM then. I wasn't sure where offset translation was occurring (now I understand its an impl. detail). Thanks for answering.

@bbernhar
Copy link
Author

bbernhar commented Jun 11, 2024

A couple issues were re-raised today by @huningxin during @a-sully's prototyping of buffer usages.

Summarized as follows:

  1. Should createBuffer() be given a default usage at creation (ex. INPUT|OUTPUT)?
  2. OUTPUT cannot disambiguate between "on-device only" or efficiently used by readBuffer().

The use-case for (2) is when a MLBuffer output gets imported into WebGPU where readBuffer() is never called (either WebGPU is the final destination or WebNN re-uses the output). A "on-device only" usage is unique because it offers better bandwidth, namely for dGPU.

For 1) I see value assuming INPUT|OUTPUT upon creation because it allows the web developer to forget about usages or tracking buffers-by-usage, esp. if performance wasn't an issue.

For 2) shall we consider prefixing CPU access visibility?

  • CPU_INPUT: CPU write optimal, slow GPU read/write
  • CPU_OUTPUT: CPU read optimal, slow GPU read/write
  • OUTPUT: CPU has no access, fast GPU read/write

Appreciate any thoughts/feedback.

@RafaelCintron @huningxin

@huningxin
Copy link
Contributor

huningxin commented Jun 13, 2024

For 2) shall we consider prefixing CPU access visibility?

  • CPU_INPUT: CPU write optimal, slow GPU read/write
  • CPU_OUTPUT: CPU read optimal, slow GPU read/write
  • OUTPUT: CPU has no access, fast GPU read/write

+1, regarding to enum value naming, should consider using something like D3D12_HEAP_TYPE enumeration?

  • UPLOAD: CPU write optimal, slow GPU read/write
  • READBACK: CPU read optimal, slow GPU read/write
  • DEFAULT: CPU has no access, fast GPU read/write

INPUT|OUTPUT

Do we need to distinguish whether a GPU buffer is used for graph input or output? I mean, how would an implementation handle INPUT and OUTPUT differently?

@bbernhar
Copy link
Author

@huningxin Thanks for the comments.

+1, regarding to enum value naming, should consider using something like D3D12_HEAP_TYPE enumeration?

The underlying memory/heap type used by the WebNN implementation could be determined based on the usage alone. See WebGPU: https://www.w3.org/TR/webgpu/#programming-model-resource-usages

Do we need to distinguish whether a GPU buffer is used for graph input or output? I mean, how would an implementation handle INPUT and OUTPUT differently?

WebNN runtime would use INPUT or OUTPUT to create buffers in write-combined or write-back memory (aka UPLOAD and READBACK per this table) and could validate if the usage matches: INPUT => dispatch(input, ...).

  • CPU_INPUT: must be dispatched input, writeBuffer() is fast, readBuffer() is slow.
  • CPU_OUTPUT: must be dispatched output, readBuffer() is fast, writeBuffer() is slow.
  • OUTPUT: must be as dispatched output, cannot use writeBuffer() or readBuffer().

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jun 14, 2024
This CL gives MLBufferDescriptor an MLOperandDescriptor as per
webmachinelearning/webnn#542

To represent this descriptor, this CL also creates a new typemapped
OperandDescriptor type which ensures that the buffer descriptor is
valid. OperandDescriptor will be used more pervasively within WebNN
in follow-up CLs

1) Move Operand::DataType to DataType (MERGED)
2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL
3) Use OperandDescriptor in mojom::Operand
4+) Remove duplicate code (especially with //components)

Bug: 343638938, 325598628
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jun 15, 2024
This CL gives MLBufferDescriptor an MLOperandDescriptor as per
webmachinelearning/webnn#542

To represent this descriptor, this CL also creates a new typemapped
OperandDescriptor type which ensures that the buffer descriptor is
valid. OperandDescriptor will be used more pervasively within WebNN
in follow-up CLs

1) Move Operand::DataType to DataType (MERGED)
2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL
3) Use OperandDescriptor in mojom::Operand
4+) Remove duplicate code (especially with //components)

Fuchsia binary size seems to be unavoidable for now, and I suspect
may be temporary once duplicate code is removed in follow-ups.
bloaty shows a binary size increase primarily in
//t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful
of other renderer-side files which depend on the mojom component

Bug: 343638938, 325598628
Fuchsia-Binary-Size: See commit description
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jun 15, 2024
This CL gives MLBufferDescriptor an MLOperandDescriptor as per
webmachinelearning/webnn#542

To represent this descriptor, this CL also creates a new typemapped
OperandDescriptor type which ensures that the buffer descriptor is
valid. OperandDescriptor will be used more pervasively within WebNN
in follow-up CLs

1) Move Operand::DataType to DataType (MERGED)
2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL
3) Use OperandDescriptor in mojom::Operand
4+) Remove duplicate code (especially with //components)

Fuchsia binary size seems to be unavoidable for now, and I suspect
may be temporary once duplicate code is removed in follow-ups.
bloaty shows a binary size increase primarily in
//t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful
of other renderer-side files which depend on the mojom component

Bug: 343638938, 325598628
Fuchsia-Binary-Size: See commit description
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163
Reviewed-by: ningxin hu <[email protected]>
Commit-Queue: Austin Sullivan <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1315553}
aarongable pushed a commit to chromium/chromium that referenced this issue Jun 15, 2024
This CL gives MLBufferDescriptor an MLOperandDescriptor as per
webmachinelearning/webnn#542

To represent this descriptor, this CL also creates a new typemapped
OperandDescriptor type which ensures that the buffer descriptor is
valid. OperandDescriptor will be used more pervasively within WebNN
in follow-up CLs

1) Move Operand::DataType to DataType (MERGED)
2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL
3) Use OperandDescriptor in mojom::Operand
4+) Remove duplicate code (especially with //components)

Fuchsia binary size seems to be unavoidable for now, and I suspect
may be temporary once duplicate code is removed in follow-ups.
bloaty shows a binary size increase primarily in
//t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful
of other renderer-side files which depend on the mojom component

Bug: 343638938, 325598628
Fuchsia-Binary-Size: See commit description
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163
Reviewed-by: ningxin hu <[email protected]>
Commit-Queue: Austin Sullivan <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1315553}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jun 15, 2024
This CL gives MLBufferDescriptor an MLOperandDescriptor as per
webmachinelearning/webnn#542

To represent this descriptor, this CL also creates a new typemapped
OperandDescriptor type which ensures that the buffer descriptor is
valid. OperandDescriptor will be used more pervasively within WebNN
in follow-up CLs

1) Move Operand::DataType to DataType (MERGED)
2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL
3) Use OperandDescriptor in mojom::Operand
4+) Remove duplicate code (especially with //components)

Fuchsia binary size seems to be unavoidable for now, and I suspect
may be temporary once duplicate code is removed in follow-ups.
bloaty shows a binary size increase primarily in
//t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful
of other renderer-side files which depend on the mojom component

Bug: 343638938, 325598628
Fuchsia-Binary-Size: See commit description
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163
Reviewed-by: ningxin hu <[email protected]>
Commit-Queue: Austin Sullivan <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1315553}
@webmachinelearning webmachinelearning deleted a comment Jun 17, 2024
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jun 20, 2024
…riptor, a=testonly

Automatic update from web-platform-tests
webnn: Give an MLBuffer an MLOperandDescriptor

This CL gives MLBufferDescriptor an MLOperandDescriptor as per
webmachinelearning/webnn#542

To represent this descriptor, this CL also creates a new typemapped
OperandDescriptor type which ensures that the buffer descriptor is
valid. OperandDescriptor will be used more pervasively within WebNN
in follow-up CLs

1) Move Operand::DataType to DataType (MERGED)
2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL
3) Use OperandDescriptor in mojom::Operand
4+) Remove duplicate code (especially with //components)

Fuchsia binary size seems to be unavoidable for now, and I suspect
may be temporary once duplicate code is removed in follow-ups.
bloaty shows a binary size increase primarily in
//t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful
of other renderer-side files which depend on the mojom component

Bug: 343638938, 325598628
Fuchsia-Binary-Size: See commit description
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel
Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163
Reviewed-by: ningxin hu <[email protected]>
Commit-Queue: Austin Sullivan <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1315553}

--

wpt-commits: ed9e9309618bdf76de06ff85757edbc8e1d7da82
wpt-pr: 46770
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this issue Jun 21, 2024
…riptor, a=testonly

Automatic update from web-platform-tests
webnn: Give an MLBuffer an MLOperandDescriptor

This CL gives MLBufferDescriptor an MLOperandDescriptor as per
webmachinelearning/webnn#542

To represent this descriptor, this CL also creates a new typemapped
OperandDescriptor type which ensures that the buffer descriptor is
valid. OperandDescriptor will be used more pervasively within WebNN
in follow-up CLs

1) Move Operand::DataType to DataType (MERGED)
2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL
3) Use OperandDescriptor in mojom::Operand
4+) Remove duplicate code (especially with //components)

Fuchsia binary size seems to be unavoidable for now, and I suspect
may be temporary once duplicate code is removed in follow-ups.
bloaty shows a binary size increase primarily in
//t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful
of other renderer-side files which depend on the mojom component

Bug: 343638938, 325598628
Fuchsia-Binary-Size: See commit description
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel
Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163
Reviewed-by: ningxin hu <[email protected]>
Commit-Queue: Austin Sullivan <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1315553}

--

wpt-commits: ed9e9309618bdf76de06ff85757edbc8e1d7da82
wpt-pr: 46770
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Jun 24, 2024
…riptor, a=testonly

Automatic update from web-platform-tests
webnn: Give an MLBuffer an MLOperandDescriptor

This CL gives MLBufferDescriptor an MLOperandDescriptor as per
webmachinelearning/webnn#542

To represent this descriptor, this CL also creates a new typemapped
OperandDescriptor type which ensures that the buffer descriptor is
valid. OperandDescriptor will be used more pervasively within WebNN
in follow-up CLs

1) Move Operand::DataType to DataType (MERGED)
2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL
3) Use OperandDescriptor in mojom::Operand
4+) Remove duplicate code (especially with //components)

Fuchsia binary size seems to be unavoidable for now, and I suspect
may be temporary once duplicate code is removed in follow-ups.
bloaty shows a binary size increase primarily in
//t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful
of other renderer-side files which depend on the mojom component

Bug: 343638938, 325598628
Fuchsia-Binary-Size: See commit description
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel
Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163
Reviewed-by: ningxin hu <[email protected]>
Commit-Queue: Austin Sullivan <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1315553}

--

wpt-commits: ed9e9309618bdf76de06ff85757edbc8e1d7da82
wpt-pr: 46770
guschmue pushed a commit to microsoft/onnxruntime that referenced this issue Jul 8, 2024
### Description
This PR enables the API added in #20816 as well as moving context
creation to JS.

### Motivation and Context
In order to enable I/O Binding with the upcoming
[MLBuffer](webmachinelearning/webnn#542) API
in the WebNN specification, we need to share the same `MLContext` across
multiple sessions. This is because `MLBuffer`s are restricted to the
`MLContext` where they were created. This PR enables developers to use
the same `MLContext` across multiple sessions.
@bbernhar
Copy link
Author

bbernhar commented Aug 6, 2024

@a-sully @reillyeon @huningxin @RafaelCintron

Thoughts/concerns with introducing the (proposed) buffer creation usages below?

For context, these new usages allow DML to correctly configure (and directly maps) memory properties upon creatingBuffer() [1] and would determine how a MLBuffer may be used after creation. WebNN backend APIs that do not require this merely validate the usage is allowed.

MLBufferUsage(s):

  • JS_READ: buffer can be used with readBuffer(). Can be combined with JS_WRITE.
  • JS_WRITE: buffer can be used with writeBuffer(). Can be combined with JS_READ.
  • JS_NONE: buffer can only be used for dispatch(). Cannot be combined with JS_WRITE or JS_READ.

JS example

const output = await mlContext.createBuffer({
  usage: GPUBufferUsage.JS_READ
});
await mlContext.readBuffer(output); // OK
mlContext.writeBuffer(output, ..); // throws error

[1] https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/context_impl_dml.cc;drc=0c5a4a1c3588e362ca65d556ff3a7fee3b3b31b8;l=246

@a-sully
Copy link
Contributor

a-sully commented Aug 6, 2024

JS example

const output = await mlContext.createBuffer({
  usage: GPUBufferUsage.JS_WRITE
});
await mlContext.readBuffer(output); // OK
mlContext.writeBuffer(output, ..); // throws error

nit: Did you mean to use MLBufferUsage.JS_READ in this example?


Eventually we'll need a flag to indicate that this buffer may be shared with WebGPU. As I've discussed elsewhere, this dictates how an MLBuffer should be allocated on Mac. That's a separate issue (#688) that I'm not trying to solve here, though it would be nice to have an idea of how the proposed MLBufferUsage flags will interact with that flag (e.g. #688 suggests that importing an MLBuffer into WebGPU will yield a GPUBuffer with GPUBufferUsageFlags.STORAGE and GPUBufferUsageFlags.COPY_SRC flags. Is this true/allowed in all cases?)

Overall this seems reasonable, though I do have a few thoughts:

  • I don't think "JS" should be in the name (e.g. this API may also be used by TypeScript or Wasm)
  • Ideally the usage flags signal what can be done rather than what can't. So rather than JS_NONE it could be DISPATCH...
  • ...or if all MLBuffers have the ability to be used with dispatch(), then this is implied and we don't need this flag at all. Not passing any other usage flags would map to D3D12_HEAP_TYPE_DEFAULT

Thoughts on:

  • READ_FROM: buffer can be used with readBuffer(). Can be combined with WRITE_TO
  • WRITE_TO: buffer can be used with writeBuffer(). Can be combined with READ_FROM
  • (eventually) WEB_GPU_INTEROP: buffer can be used with GPUDevice.importExternalBuffer(). Can be combined with ???

@bbernhar
Copy link
Author

bbernhar commented Aug 6, 2024

Thanks @a-sully for the feedback.

nit: Did you mean to use MLBufferUsage.JS_READ in this example?

Good catch, fixed.

and we don't need this flag at all

SGTM.

Thoughts on:

  • READ_FROM: buffer can be used with readBuffer(). Can be combined with WRITE_TO
  • WRITE_TO: buffer can be used with writeBuffer(). Can be combined with READ_FROM

SGTM.

(eventually) WEB_GPU_INTEROP: buffer can be used with GPUDevice.importExternalBuffer(). Can be combined with ???

With any other WebNN usages. Calling importExternalBuffer() could simply ignore them as MLBuffer is (currently) treated as WebGPU-equivalent usage of STORAGE and is neutered.

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Aug 21, 2024
Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to
maximize device memory bandwidth. After this change, createBuffer()
assumes "no usage" by default. To readBuffer() or writeBuffer(), the
corresponding usage flag must be specified by the web developer.
Combining usages is allowed but could be inefficient. Usages are
always validated even if a backend doesn't use it.

webmachinelearning/webnn#542

Bug: 343638938
Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
aarongable pushed a commit to chromium/chromium that referenced this issue Aug 21, 2024
Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to
maximize device memory bandwidth. After this change, createBuffer()
assumes "no usage" by default. To readBuffer() or writeBuffer(), the
corresponding usage flag must be specified by the web developer.
Combining usages is allowed but could be inefficient. Usages are
always validated even if a backend doesn't use it.

webmachinelearning/webnn#542

Bug: 343638938
Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041
Commit-Queue: Bryan Bernhart <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1344910}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Aug 21, 2024
Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to
maximize device memory bandwidth. After this change, createBuffer()
assumes "no usage" by default. To readBuffer() or writeBuffer(), the
corresponding usage flag must be specified by the web developer.
Combining usages is allowed but could be inefficient. Usages are
always validated even if a backend doesn't use it.

webmachinelearning/webnn#542

Bug: 343638938
Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041
Commit-Queue: Bryan Bernhart <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1344910}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Aug 21, 2024
Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to
maximize device memory bandwidth. After this change, createBuffer()
assumes "no usage" by default. To readBuffer() or writeBuffer(), the
corresponding usage flag must be specified by the web developer.
Combining usages is allowed but could be inefficient. Usages are
always validated even if a backend doesn't use it.

webmachinelearning/webnn#542

Bug: 343638938
Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041
Commit-Queue: Bryan Bernhart <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1344910}
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Aug 23, 2024
…, a=testonly

Automatic update from web-platform-tests
WebNN: add buffer usages for DML backend

Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to
maximize device memory bandwidth. After this change, createBuffer()
assumes "no usage" by default. To readBuffer() or writeBuffer(), the
corresponding usage flag must be specified by the web developer.
Combining usages is allowed but could be inefficient. Usages are
always validated even if a backend doesn't use it.

webmachinelearning/webnn#542

Bug: 343638938
Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041
Commit-Queue: Bryan Bernhart <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1344910}

--

wpt-commits: f21d93823c4ca8b1cb01b3ff1730af9c049840e5
wpt-pr: 47718
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this issue Aug 24, 2024
…, a=testonly

Automatic update from web-platform-tests
WebNN: add buffer usages for DML backend

Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to
maximize device memory bandwidth. After this change, createBuffer()
assumes "no usage" by default. To readBuffer() or writeBuffer(), the
corresponding usage flag must be specified by the web developer.
Combining usages is allowed but could be inefficient. Usages are
always validated even if a backend doesn't use it.

webmachinelearning/webnn#542

Bug: 343638938
Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041
Commit-Queue: Bryan Bernhart <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1344910}

--

wpt-commits: f21d93823c4ca8b1cb01b3ff1730af9c049840e5
wpt-pr: 47718
ErichDonGubler pushed a commit to erichdongubler-mozilla/firefox that referenced this issue Aug 26, 2024
…, a=testonly

Automatic update from web-platform-tests
WebNN: add buffer usages for DML backend

Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to
maximize device memory bandwidth. After this change, createBuffer()
assumes "no usage" by default. To readBuffer() or writeBuffer(), the
corresponding usage flag must be specified by the web developer.
Combining usages is allowed but could be inefficient. Usages are
always validated even if a backend doesn't use it.

webmachinelearning/webnn#542

Bug: 343638938
Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041
Commit-Queue: Bryan Bernhart <[email protected]>
Reviewed-by: Alex Gough <[email protected]>
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1344910}

--

wpt-commits: f21d93823c4ca8b1cb01b3ff1730af9c049840e5
wpt-pr: 47718
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants