-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLBuffer] Creation and representing MLBuffer on a XPU devices #542
Comments
My recent investigation into supporting 1. We need a WebGPU usage flag (at minimum)The only zero-copy way to pass a buffer to both WebGPU (as an If the 2.
|
Thanks @a-sully for delving into the CoreML side of things. Regarding the need for a WebGPU usage flag: Is it feasible for an MLBuffer to always be created as an MLMultiArray where, upon import to WebGPU, we could assign or request the usages? Assigning GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC appears to be sufficient. As for the question about keeping MLSize64: Without |
AFAICT an
Hmm I'm not sure if I understand your concern.... Implementations are still welcome to allocate an Is the use case you're referring to one where a single
This hides an internal reinterpretation of the data type and dimensions of what's assumed to be an opaque bag of bytes. I think there's a reasonable argument that this implementation detail should not make its way into the WebNN spec, which shouldn't prescribe a particular implementation. WebNN has Could you elaborate on the use case(s) you have in mind?
What would be the expected behavior on platforms which require a data type and dimensions when the buffer is allocated? An |
I was expecting we start the allocation in CoreML via
Consider a native C++ framework which implements a |
Ah, I think my wording of "We need a WebGPU usage flag" above was misleading. I'm not suggesting that we need WebGPU usage flags here, but rather a usage flag saying "I want this Could you also clarify what exactly you mean by "start the allocation in CoreML"? I assume you mean "as an
Thanks for the explanation. Could you provide a concrete example of where this concern is relevant? A quick glance at some common ML frameworks suggests that just Another consideration is that |
Could we pass a union to
Yes, the ORT web tensor dtype can only be implemented behind a "malloc" like C inference. When WebNN is used as a EP, it exists within the ML runtime itself. |
I don't understand this comment because all of the |
Notice the |
Taking a step back, the Web Platform Design Principles implore us to "design based on user needs, not the underlying API or hardware":
The use cases for an
The point about considering flexibility for new hardware is especially pertinent to WebNN :) While I understand the desire to design a web platform API which (especially WASM) user-space frameworks can easily plug into, the web platform API should not bend over backwards to accommodate the implementation choices of any given framework. And the web platform API certainly should not bake in assumptions based on the current limitations of said frameworks! In this case, ORT does not support CoreML in cases where an |
[@a-sully wrote]
For Apple platforms, my understanding is you can go from Why are IOSurfaces be required? |
Seems doable. |
Good question! I originally thought so too, but my current understanding is that this is not generically true (i.e. for all data types). If anyone can definitively confirm or dispute this understanding (@mwyrzykowski?) please speak up! Alright here goes... The docs of
whereas the docs for
so I would assume that this would not be allowed (or at least not be zero-copy) unless the How can we ensure an Of all the The first one looks promising! Unfortunately it seems - based on past offline discussions - that CoreML internally makes a copy of the bytes when using this constructor. That So this would not be zero-copy:
The latter constructor takes a
So eith regards to this question....
It seems that the only way to avoid copies of a backing memory which is to be shared as both an
|
It is zero copy in CoreML but anything other than fp16 + CVPixelBuffer will result in a copy below CoreML |
Not all HW APIs require a Unless |
@a-sully Thinking of a way forward to unblock CoreML. Here's the options I've gathered:
I am not a fan of (1) because it bakes assumptions into the WebNN spec (ex. ORT never pre-allocates or uses untyped buffers). Untyped buffers (aka byte buffers with a linear layout) for example, could be partially dispatched via a The other option (2), means WebNN backends (ex. DML resources) must be re-implemented to work like |
Thanks for the input @bbernhar. I've been exploring this space more, and I still believe the path forward if we want "a device-based storage object that may be used by WebNN operations" is the following: 4. Use Responses inline:
Hmm I'm not following here. The question is not whether HW APIs need an If an ML framework wants to allocate a
Please refer back to #542 (comment). The WebNN spec should not prescribe implementation details, such at that the buffer must be allocated contiguously. This violates the design principles here: https://w3ctag.github.io/design-principles/#usecase-oriented-apis
I don't understand this suggestion.
This is not possible without several data copies (e.g. where does the data go when |
The developer has to know the layout in order to calculate offsets which split-up and re-use a larger buffer piece-meal. Note: a linear layout does not dictate how
Bummer. The more I think about it, the more likely |
Ah yes, this is what I've been advocating for but without using that specific vocabulary 😛 |
If the layout of Could you help me understand the plan there? |
Hmmm I thought it was a given (based on my earlier comments here) that My claim - if we assume that |
Nope, the proposed change SGTM then. I wasn't sure where offset translation was occurring (now I understand its an impl. detail). Thanks for answering. |
A couple issues were re-raised today by @huningxin during @a-sully's prototyping of buffer usages. Summarized as follows:
The use-case for (2) is when a For 1) I see value assuming For 2) shall we consider prefixing CPU access visibility?
Appreciate any thoughts/feedback. |
+1, regarding to enum value naming, should consider using something like D3D12_HEAP_TYPE enumeration?
Do we need to distinguish whether a GPU buffer is used for graph input or output? I mean, how would an implementation handle |
@huningxin Thanks for the comments.
The underlying memory/heap type used by the WebNN implementation could be determined based on the usage alone. See WebGPU: https://www.w3.org/TR/webgpu/#programming-model-resource-usages
WebNN runtime would use
|
This CL gives MLBufferDescriptor an MLOperandDescriptor as per webmachinelearning/webnn#542 To represent this descriptor, this CL also creates a new typemapped OperandDescriptor type which ensures that the buffer descriptor is valid. OperandDescriptor will be used more pervasively within WebNN in follow-up CLs 1) Move Operand::DataType to DataType (MERGED) 2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL 3) Use OperandDescriptor in mojom::Operand 4+) Remove duplicate code (especially with //components) Bug: 343638938, 325598628 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
This CL gives MLBufferDescriptor an MLOperandDescriptor as per webmachinelearning/webnn#542 To represent this descriptor, this CL also creates a new typemapped OperandDescriptor type which ensures that the buffer descriptor is valid. OperandDescriptor will be used more pervasively within WebNN in follow-up CLs 1) Move Operand::DataType to DataType (MERGED) 2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL 3) Use OperandDescriptor in mojom::Operand 4+) Remove duplicate code (especially with //components) Fuchsia binary size seems to be unavoidable for now, and I suspect may be temporary once duplicate code is removed in follow-ups. bloaty shows a binary size increase primarily in //t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful of other renderer-side files which depend on the mojom component Bug: 343638938, 325598628 Fuchsia-Binary-Size: See commit description Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458
This CL gives MLBufferDescriptor an MLOperandDescriptor as per webmachinelearning/webnn#542 To represent this descriptor, this CL also creates a new typemapped OperandDescriptor type which ensures that the buffer descriptor is valid. OperandDescriptor will be used more pervasively within WebNN in follow-up CLs 1) Move Operand::DataType to DataType (MERGED) 2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL 3) Use OperandDescriptor in mojom::Operand 4+) Remove duplicate code (especially with //components) Fuchsia binary size seems to be unavoidable for now, and I suspect may be temporary once duplicate code is removed in follow-ups. bloaty shows a binary size increase primarily in //t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful of other renderer-side files which depend on the mojom component Bug: 343638938, 325598628 Fuchsia-Binary-Size: See commit description Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163 Reviewed-by: ningxin hu <[email protected]> Commit-Queue: Austin Sullivan <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Cr-Commit-Position: refs/heads/main@{#1315553}
This CL gives MLBufferDescriptor an MLOperandDescriptor as per webmachinelearning/webnn#542 To represent this descriptor, this CL also creates a new typemapped OperandDescriptor type which ensures that the buffer descriptor is valid. OperandDescriptor will be used more pervasively within WebNN in follow-up CLs 1) Move Operand::DataType to DataType (MERGED) 2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL 3) Use OperandDescriptor in mojom::Operand 4+) Remove duplicate code (especially with //components) Fuchsia binary size seems to be unavoidable for now, and I suspect may be temporary once duplicate code is removed in follow-ups. bloaty shows a binary size increase primarily in //t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful of other renderer-side files which depend on the mojom component Bug: 343638938, 325598628 Fuchsia-Binary-Size: See commit description Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163 Reviewed-by: ningxin hu <[email protected]> Commit-Queue: Austin Sullivan <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Cr-Commit-Position: refs/heads/main@{#1315553}
This CL gives MLBufferDescriptor an MLOperandDescriptor as per webmachinelearning/webnn#542 To represent this descriptor, this CL also creates a new typemapped OperandDescriptor type which ensures that the buffer descriptor is valid. OperandDescriptor will be used more pervasively within WebNN in follow-up CLs 1) Move Operand::DataType to DataType (MERGED) 2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL 3) Use OperandDescriptor in mojom::Operand 4+) Remove duplicate code (especially with //components) Fuchsia binary size seems to be unavoidable for now, and I suspect may be temporary once duplicate code is removed in follow-ups. bloaty shows a binary size increase primarily in //t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful of other renderer-side files which depend on the mojom component Bug: 343638938, 325598628 Fuchsia-Binary-Size: See commit description Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163 Reviewed-by: ningxin hu <[email protected]> Commit-Queue: Austin Sullivan <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Cr-Commit-Position: refs/heads/main@{#1315553}
…riptor, a=testonly Automatic update from web-platform-tests webnn: Give an MLBuffer an MLOperandDescriptor This CL gives MLBufferDescriptor an MLOperandDescriptor as per webmachinelearning/webnn#542 To represent this descriptor, this CL also creates a new typemapped OperandDescriptor type which ensures that the buffer descriptor is valid. OperandDescriptor will be used more pervasively within WebNN in follow-up CLs 1) Move Operand::DataType to DataType (MERGED) 2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL 3) Use OperandDescriptor in mojom::Operand 4+) Remove duplicate code (especially with //components) Fuchsia binary size seems to be unavoidable for now, and I suspect may be temporary once duplicate code is removed in follow-ups. bloaty shows a binary size increase primarily in //t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful of other renderer-side files which depend on the mojom component Bug: 343638938, 325598628 Fuchsia-Binary-Size: See commit description Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163 Reviewed-by: ningxin hu <[email protected]> Commit-Queue: Austin Sullivan <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Cr-Commit-Position: refs/heads/main@{#1315553} -- wpt-commits: ed9e9309618bdf76de06ff85757edbc8e1d7da82 wpt-pr: 46770
…riptor, a=testonly Automatic update from web-platform-tests webnn: Give an MLBuffer an MLOperandDescriptor This CL gives MLBufferDescriptor an MLOperandDescriptor as per webmachinelearning/webnn#542 To represent this descriptor, this CL also creates a new typemapped OperandDescriptor type which ensures that the buffer descriptor is valid. OperandDescriptor will be used more pervasively within WebNN in follow-up CLs 1) Move Operand::DataType to DataType (MERGED) 2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL 3) Use OperandDescriptor in mojom::Operand 4+) Remove duplicate code (especially with //components) Fuchsia binary size seems to be unavoidable for now, and I suspect may be temporary once duplicate code is removed in follow-ups. bloaty shows a binary size increase primarily in //t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful of other renderer-side files which depend on the mojom component Bug: 343638938, 325598628 Fuchsia-Binary-Size: See commit description Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163 Reviewed-by: ningxin hu <[email protected]> Commit-Queue: Austin Sullivan <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Cr-Commit-Position: refs/heads/main@{#1315553} -- wpt-commits: ed9e9309618bdf76de06ff85757edbc8e1d7da82 wpt-pr: 46770
…riptor, a=testonly Automatic update from web-platform-tests webnn: Give an MLBuffer an MLOperandDescriptor This CL gives MLBufferDescriptor an MLOperandDescriptor as per webmachinelearning/webnn#542 To represent this descriptor, this CL also creates a new typemapped OperandDescriptor type which ensures that the buffer descriptor is valid. OperandDescriptor will be used more pervasively within WebNN in follow-up CLs 1) Move Operand::DataType to DataType (MERGED) 2) Create a typemapped OperandDescriptor class for MLBuffer <-- this CL 3) Use OperandDescriptor in mojom::Operand 4+) Remove duplicate code (especially with //components) Fuchsia binary size seems to be unavoidable for now, and I suspect may be temporary once duplicate code is removed in follow-ups. bloaty shows a binary size increase primarily in //t/b/r/m/ml/webnn/ml_graph_type_converter.cc, as well as a handful of other renderer-side files which depend on the mojom component Bug: 343638938, 325598628 Fuchsia-Binary-Size: See commit description Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I775340f5c5e0e80942332cbae750d0d305cdd458 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5604163 Reviewed-by: ningxin hu <[email protected]> Commit-Queue: Austin Sullivan <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: Reilly Grant <[email protected]> Cr-Commit-Position: refs/heads/main@{#1315553} -- wpt-commits: ed9e9309618bdf76de06ff85757edbc8e1d7da82 wpt-pr: 46770
### Description This PR enables the API added in #20816 as well as moving context creation to JS. ### Motivation and Context In order to enable I/O Binding with the upcoming [MLBuffer](webmachinelearning/webnn#542) API in the WebNN specification, we need to share the same `MLContext` across multiple sessions. This is because `MLBuffer`s are restricted to the `MLContext` where they were created. This PR enables developers to use the same `MLContext` across multiple sessions.
@a-sully @reillyeon @huningxin @RafaelCintron Thoughts/concerns with introducing the (proposed) buffer creation usages below? For context, these new usages allow DML to correctly configure (and directly maps) memory properties upon creatingBuffer() [1] and would determine how a MLBufferUsage(s):
JS example const output = await mlContext.createBuffer({
usage: GPUBufferUsage.JS_READ
});
await mlContext.readBuffer(output); // OK
mlContext.writeBuffer(output, ..); // throws error |
nit: Did you mean to use Eventually we'll need a flag to indicate that this buffer may be shared with WebGPU. As I've discussed elsewhere, this dictates how an Overall this seems reasonable, though I do have a few thoughts:
Thoughts on:
|
Thanks @a-sully for the feedback.
Good catch, fixed.
SGTM.
SGTM.
With any other WebNN usages. Calling importExternalBuffer() could simply ignore them as |
Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to maximize device memory bandwidth. After this change, createBuffer() assumes "no usage" by default. To readBuffer() or writeBuffer(), the corresponding usage flag must be specified by the web developer. Combining usages is allowed but could be inefficient. Usages are always validated even if a backend doesn't use it. webmachinelearning/webnn#542 Bug: 343638938 Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel
Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to maximize device memory bandwidth. After this change, createBuffer() assumes "no usage" by default. To readBuffer() or writeBuffer(), the corresponding usage flag must be specified by the web developer. Combining usages is allowed but could be inefficient. Usages are always validated even if a backend doesn't use it. webmachinelearning/webnn#542 Bug: 343638938 Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041 Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: ningxin hu <[email protected]> Reviewed-by: Austin Sullivan <[email protected]> Cr-Commit-Position: refs/heads/main@{#1344910}
Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to maximize device memory bandwidth. After this change, createBuffer() assumes "no usage" by default. To readBuffer() or writeBuffer(), the corresponding usage flag must be specified by the web developer. Combining usages is allowed but could be inefficient. Usages are always validated even if a backend doesn't use it. webmachinelearning/webnn#542 Bug: 343638938 Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041 Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: ningxin hu <[email protected]> Reviewed-by: Austin Sullivan <[email protected]> Cr-Commit-Position: refs/heads/main@{#1344910}
Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to maximize device memory bandwidth. After this change, createBuffer() assumes "no usage" by default. To readBuffer() or writeBuffer(), the corresponding usage flag must be specified by the web developer. Combining usages is allowed but could be inefficient. Usages are always validated even if a backend doesn't use it. webmachinelearning/webnn#542 Bug: 343638938 Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041 Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: ningxin hu <[email protected]> Reviewed-by: Austin Sullivan <[email protected]> Cr-Commit-Position: refs/heads/main@{#1344910}
…, a=testonly Automatic update from web-platform-tests WebNN: add buffer usages for DML backend Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to maximize device memory bandwidth. After this change, createBuffer() assumes "no usage" by default. To readBuffer() or writeBuffer(), the corresponding usage flag must be specified by the web developer. Combining usages is allowed but could be inefficient. Usages are always validated even if a backend doesn't use it. webmachinelearning/webnn#542 Bug: 343638938 Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041 Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: ningxin hu <[email protected]> Reviewed-by: Austin Sullivan <[email protected]> Cr-Commit-Position: refs/heads/main@{#1344910} -- wpt-commits: f21d93823c4ca8b1cb01b3ff1730af9c049840e5 wpt-pr: 47718
…, a=testonly Automatic update from web-platform-tests WebNN: add buffer usages for DML backend Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to maximize device memory bandwidth. After this change, createBuffer() assumes "no usage" by default. To readBuffer() or writeBuffer(), the corresponding usage flag must be specified by the web developer. Combining usages is allowed but could be inefficient. Usages are always validated even if a backend doesn't use it. webmachinelearning/webnn#542 Bug: 343638938 Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041 Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: ningxin hu <[email protected]> Reviewed-by: Austin Sullivan <[email protected]> Cr-Commit-Position: refs/heads/main@{#1344910} -- wpt-commits: f21d93823c4ca8b1cb01b3ff1730af9c049840e5 wpt-pr: 47718
…, a=testonly Automatic update from web-platform-tests WebNN: add buffer usages for DML backend Exposes MLBufferUsageFlags to MLBufferDescriptor and adds new usages to maximize device memory bandwidth. After this change, createBuffer() assumes "no usage" by default. To readBuffer() or writeBuffer(), the corresponding usage flag must be specified by the web developer. Combining usages is allowed but could be inefficient. Usages are always validated even if a backend doesn't use it. webmachinelearning/webnn#542 Bug: 343638938 Change-Id: I4d78e3f8bacd7cbabce3038c234c062c7c07b095 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5787041 Commit-Queue: Bryan Bernhart <[email protected]> Reviewed-by: Alex Gough <[email protected]> Reviewed-by: ningxin hu <[email protected]> Reviewed-by: Austin Sullivan <[email protected]> Cr-Commit-Position: refs/heads/main@{#1344910} -- wpt-commits: f21d93823c4ca8b1cb01b3ff1730af9c049840e5 wpt-pr: 47718
Purpose/Motivation
Defines a device-based storage object that may be used by WebNN operations. This is a sub-issue of #482.
Proposed API
Example JS
Edits
MLOperandDescriptor
Alternative API proposals
N/A
Opens
The text was updated successfully, but these errors were encountered: