Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Add missing 64-bit integers support for some reduction operators #695

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

huningxin
Copy link
Contributor

@huningxin huningxin commented May 27, 2024

reduceL1, reduceProduct, reduceSum and reduceSumSquare already support 32-bit integers. 64-bit integers should also be supported.

Fix #283, #694


Preview | Diff

`reduceL1`, `reduceProduct`, `reduceSum` and `reduceSumSquare` already
support 32-bit integers. 64-bit integers should also be supported.

Fix webmachinelearning#283, webmachinelearning#694
@huningxin huningxin requested review from fdwr and inexorabletash May 27, 2024 06:09
@Honry
Copy link
Contributor

Honry commented May 27, 2024

@fdwr, @huningxin, starting from ONNX Opset 18, all Reduce* ops support int64 and uint64 in DML EP, see

https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/dml/DmlExecutionProvider/src/Operators/OperatorRegistration.cpp#L937-L979

While from https://learn.microsoft.com/en-us/windows/ai/directml/dml-feature-level-history#dml_feature_level_5_0

It mentions the data type expanding for

  • DML_REDUCE_FUNCTION_L1
  • DML_REDUCE_FUNCTION_MAX
  • DML_REDUCE_FUNCTION_MIN
  • DML_REDUCE_FUNCTION_MULTIPLY
  • DML_REDUCE_FUNCTION_SUM
  • DML_REDUCE_FUNCTION_SUM_SQUARE

But it doesn't mention floating point values for reduceMean and others. Is it a documentation issue?

Copy link
Member

@inexorabletash inexorabletash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@a-sully
Copy link
Contributor

a-sully commented May 28, 2024

#654 proposes removing 64-bit int support for some operators. What's the rationale for adding 64-bit int support for these operators?

This may be a discussion for another issue, but does WebNN need to support 64-bit integer types at all given trends towards smaller data types (e.g. int4 and int8) for on-device ML and that some backends have little/no support for 64-bit values in the first place? (e.g. CoreML has ~no support, and most GPUs emulate support for int64)

@huningxin
Copy link
Contributor Author

@a-sully

#654 proposes removing 64-bit int support for some operators.

IIUC, #654 mentioned lower feature level DirectML (before FL 4.1) doesn't support 64-bit integers for some operators. Higher feature DirectML doesn't have that issue. As @inexorabletash mentioned we may assume the browser always carries a copy of library that ensures the highest feature level. If that's the case, we may close that issue.

What's the rationale for adding 64-bit int support for these operators?

As #694 mentioned, the safety checker model for stable diffusion turbo demo uses reduceSum for int64 input and DirectML DML_REDUCE_FUNCTION_SUM supports that 64-bit integers.

some backends have little/no support for 64-bit values in the first place? (e.g. CoreML has ~no support, and most GPUs emulate support for int64)

We may want to follow the backend difference through #463 .

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request May 31, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 1, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 1, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
aarongable pushed a commit to chromium/chromium that referenced this pull request Jun 3, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Commit-Queue: Lisha Guo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1309157}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 3, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Commit-Queue: Lisha Guo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1309157}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 3, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Commit-Queue: Lisha Guo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1309157}
sadym-chromium pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 3, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Commit-Queue: Lisha Guo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1309157}
@a-sully
Copy link
Contributor

a-sully commented Jun 3, 2024

What's the rationale for adding 64-bit int support for these operators?

As #694 mentioned, the safety checker model for stable diffusion turbo demo uses reduceSum for int64 input and DirectML DML_REDUCE_FUNCTION_SUM supports that 64-bit integers.

I know this is already implemented in Chromium so I don't mean to quibble too much over this, but I think it's worth questioning whether there should be a process for changing supported data types similar to the existing process for adding new operators

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Jun 5, 2024
…rt for some reduce operators, a=testonly

Automatic update from web-platform-tests
WebNN: Add missing 64-bit integers support for some reduce operators

This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Commit-Queue: Lisha Guo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1309157}

--

wpt-commits: 50c7448895efb9f6cfe0b37d77d018ea549d4b99
wpt-pr: 46568
ErichDonGubler pushed a commit to erichdongubler-mozilla/firefox that referenced this pull request Jun 10, 2024
…rt for some reduce operators, a=testonly

Automatic update from web-platform-tests
WebNN: Add missing 64-bit integers support for some reduce operators

This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Commit-Queue: Lisha Guo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1309157}

--

wpt-commits: 50c7448895efb9f6cfe0b37d77d018ea549d4b99
wpt-pr: 46568
@huningxin huningxin changed the title Add missing 64-bit integers support for some reduction operators Bugfix: Add missing 64-bit integers support for some reduction operators Jun 11, 2024
@huningxin
Copy link
Contributor Author

@a-sully

but I think it's worth questioning whether there should be a process for changing supported data types similar to the existing process for adding new operators

I'd support adding a process for updating the operators. A PR is proposed: #705. PTAL.

This PR is a follow-up bug fix for #283. In that issue, @fdwr confirmed L1, SUM_SQUARE, MULTIPLY, SUM reduce functions support 32 and 64 bit integers. However, I forgot to add 64-bit integers into the table of #283 which causes @inexorabletash 's PR #646 missing the 64-bit integers support for those reduce operators.

As mentioned, this issue causes the safety checker model for stable diffusion turbo demo failing because it uses reduceSum with int64 input. Chromium already fixed it and I also updated the table of #283.

@fdwr
Copy link
Collaborator

fdwr commented Jun 12, 2024

But it doesn't mention floating point values for reduceMean and others. Is it a documentation issue?

@Honry : The doc is correct. DML_OPERATOR_REDUCE + REDUCE_FUNCTION_AVERAGE https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_reduce_operator_desc only supports float. We didn't support int64 for mean/average reduction because:

  • (1) the averaging division most likely would produce a floating-point result anyway
  • (2) it saves shader space if not actually used (and there weren't any clients for int64 average at the time)
  • (3) it raises policy questions for fractional values of whether to truncate, floor, ceil, round to nearest evens...
  • (4) it's trivial to implement with an explicit REDUCE_SUM followed by DIVIDE.

Notice that similarly ReduceL1 was supported, but ReduceL2 (which has a square root that yields fractional values) does not have an int64 version (and also LogSum, LogSumExp...).

{
  "groupFieldValues": ["ARGMIN", "ARGMAX"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["uint32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "2.1", "InputDataType": ["AllTensorDataTypes8To32"], "OutputDataType": ["uint32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["AllTensorDataTypes8To32"], "OutputDataType": ["AllNonFloatTensorDataTypes32To64"], "DefaultRank": {"Min": 1, "Max": 8}},
    {"featureLevel": "4.1", "InputDataType": ["AllTensorDataTypes8To64ExceptFloat64"], "OutputDataType": ["AllNonFloatTensorDataTypes32To64"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
},{
  "groupFieldValues": ["AVERAGE", "L2", "LOG_SUM", "LOG_SUM_EXP"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
},{
  "groupFieldValues": ["L1", "SUM_SQUARE"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 1, "Max": 8}},
    {"featureLevel": "5.0", "InputDataType": ["AllTensorDataTypes32To64ExceptFloat64"], "OutputDataType": ["AllTensorDataTypes32To64ExceptFloat64"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
},{
  "groupFieldValues": ["MIN", "MAX"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "2.1", "InputDataType": ["float16", "float32", "uint32", "int32"], "OutputDataType": ["float16", "float32", "uint32", "int32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["AllTensorDataTypes8To32"], "OutputDataType": ["AllTensorDataTypes8To32"], "DefaultRank": {"Min": 1, "Max": 8}},
    {"featureLevel": "5.0", "InputDataType": ["AllTensorDataTypes8To64ExceptFloat64"], "OutputDataType": ["AllTensorDataTypes8To64ExceptFloat64"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
},{
  "groupFieldValues": ["MULTIPLY", "SUM"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "2.1", "InputDataType": ["float16", "float32", "uint32", "int32"], "OutputDataType": ["float16", "float32", "uint32", "int32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["float16", "float32", "uint32", "int32"], "OutputDataType": ["float16", "float32", "uint32", "int32"], "DefaultRank": {"Min": 1, "Max": 8}},
    {"featureLevel": "5.0", "InputDataType": ["AllTensorDataTypes32To64ExceptFloat64"], "OutputDataType": ["AllTensorDataTypes32To64ExceptFloat64"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
}

Copy link
Collaborator

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks Ningxin.

@philloooo
Copy link
Contributor

As mentioned, this issue causes the safety checker model for stable diffusion turbo demo failing because it uses reduceSum with int64 input.

@huningxin given that int64 is not supported on CoreML, how hard is it for the stable diffusion turbo demo to convert the model from int64 to int32?

@huningxin
Copy link
Contributor Author

@philloooo

As mentioned, this issue causes the safety checker model for stable diffusion turbo demo failing because it uses reduceSum with int64 input.

@huningxin given that int64 is not supported on CoreML, how hard is it for the stable diffusion turbo demo to convert the model from int64 to int32?

The original model casts to int64 before reduceSum. We'll try to change it to int32 and see whether it still works. Will keep this thread posted.

@huningxin
Copy link
Contributor Author

@philloooo

As mentioned, this issue causes the safety checker model for stable diffusion turbo demo failing because it uses reduceSum with int64 input.

@huningxin given that int64 is not supported on CoreML, how hard is it for the stable diffusion turbo demo to convert the model from int64 to int32?

The original model casts to int64 before reduceSum. We'll try to change it to int32 and see whether it still works. Will keep this thread posted.

After the investigation, the two reduceSum operators of safety checker model are able to take int32 input. @Honry helped create an int32 version and host it at https://huggingface.co/lwanming/sd-turbo-ort-web/blob/main/safety_checker_int32_reduceSum.onnx, feel free to check it out.

More details:

  1. The first reduceSum (name "/ReduceSum") takes a 2-D input tensor in shape [1, 3] and reduce along axis 1. The tensor value is either 0 or 1 because it is the output of preceding greater operator. It's safe to cast the input tensor to int32 because it won't overflow when summing three 1s together.
  2. Similarly, the second reduceSum (name "/ReduceSum_1") takes a 2-D input tensor as output of preceding greater operator in shape [1, 17] and reduce along axis 1. It's also safe to cast the input tensor to int32 for the same reason.

@fdwr
Copy link
Collaborator

fdwr commented Jun 13, 2024

@mwyrzykowski Do you know any plans to update CoreML to support int64? It's oddly inconsistent that all the other Apple ML API's (BNNS, MPS, MLX) support int64, but CoreML does not 🤔. Is CoreML still the right API these days to implement a WebNN backend, or is it left behind by newer ones? Thanks for any information or redirections.

Type BNNS MPS MLX CoreML
uint8 BNNSDataTypeUInt8 MPSDataType.uInt8 mx.uint8 x
uint16 BNNSDataTypeUInt16 MPSDataType.uInt16 mx.uint16 x
uint32 BNNSDataTypeUInt32 MPSDataType.uInt32 mx.uint32 x
uint64 BNNSDataTypeUInt64 MPSDataType.uInt64 mx.uint64 x 🤔
int8 BNNSDataTypeInt8 MPSDataType.int8 mx.int8 x
int16 BNNSDataTypeInt16 MPSDataType.int16 mx.int16 x
int32 BNNSDataTypeInt32 MPSDataType.int32 mx.int32 ArrayFeatureType.ArrayDataType.INT32
int64 BNNSDataTypeInt64 MPSDataType.int64 mx.int64 x 🤔
float16f10e5s1 IEEE BNNSDataTypeFloat16 MPSDataType.float16 mx.float16 ArrayFeatureType.ArrayDataType.FLOAT16
float16f7e8s1 Brain BNNSDataTypeBFloat16 MPSDataType.bFloat16 x x
float32f23e8s1 IEEE BNNSDataTypeFloat32 MPSDataType.float32 mx.float32 ArrayFeatureType.ArrayDataType.FLOAT32
float64f52e11s1 IEEE x x x ArrayFeatureType.ArrayDataType.DOUBLE
float16 x 2 x MPSDataType.complexFloat16 x x
float32 x 2 x MPSDataType.complexFloat32 x x
float64 x 2 x x x x
bool8 BNNSDataTypeBoolean MPSDataType.bool bool_ x

@mwyrzykowski
Copy link

@fdwr BNNS only runs on the CPU and MLX being open source can not use the ANE / NPU. MPS being backed by Metal only runs on the GPU.

If running on the ANE is a goal, CoreML is necessary.

@a-sully
Copy link
Contributor

a-sully commented Jun 13, 2024

If running on the ANE is a goal, CoreML is necessary.

Can confirm this is an important goal :)

@fdwr
Copy link
Collaborator

fdwr commented Jun 13, 2024

If running on the ANE is a goal, CoreML is necessary.

Can confirm this is an important goal :)

Concur, this is important (at least for those models that can fully run on ANN or those parts of a model that are viable to run on it).

i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Jun 14, 2024
…rt for some reduce operators, a=testonly

Automatic update from web-platform-tests
WebNN: Add missing 64-bit integers support for some reduce operators

This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <[email protected]>
Reviewed-by: Austin Sullivan <[email protected]>
Commit-Queue: Lisha Guo <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1309157}

--

wpt-commits: 50c7448895efb9f6cfe0b37d77d018ea549d4b99
wpt-pr: 46568
guschmue pushed a commit to microsoft/onnxruntime that referenced this pull request Jun 17, 2024
WebNN Spec adds missing 64-bit integers support for `reduceL1`,
`reduceSum`, `reduceSumSquare` and `reduceProduct` ops at this
[PR](webmachinelearning/webnn#695), which has
already been implemented in Chromium. Update corresponding data type
constraints in WebNN EP.

Besides, WebNN CPU backend currently doesn't support `uint64` and
`uint32` for these ops.
gyagp pushed a commit to gyagp/onnxruntime that referenced this pull request Jun 18, 2024
…20912)

WebNN Spec adds missing 64-bit integers support for `reduceL1`,
`reduceSum`, `reduceSumSquare` and `reduceProduct` ops at this
[PR](webmachinelearning/webnn#695), which has
already been implemented in Chromium. Update corresponding data type
constraints in WebNN EP.

Besides, WebNN CPU backend currently doesn't support `uint64` and
`uint32` for these ops.
@fdwr
Copy link
Collaborator

fdwr commented Jul 8, 2024

With @philloooo 's initial op set limits CR complete 🙂, we can see a path toward conditionally supporting data types (similar to how WebGPU does not support all texture data types on all backends).

@philloooo
Copy link
Contributor

I am still curious to know:

  1. how hard is it to convert the models to use int32 instead.
  2. What's the performance look like to hop between WASM and WebNN EP when executing on CoreML backend with this int64 safety checker model.

@fdwr
Copy link
Collaborator

fdwr commented Aug 1, 2024

  1. how hard is it to convert the models to use int32 instead

@philloooo : Update - the int32 model is uploaded thanks to Belem and Adele (https://huggingface.co/microsoft/sd-turbo-webnn/tree/main/safety_checker), and @ibelem updated the sample (microsoft/webnn-developer-preview#11).

@Honry
Copy link
Contributor

Honry commented Aug 1, 2024

how hard is it to convert the models to use int32 instead.

@philloooo, thanks to @fdwr's tool: https://github.com/fdwr/Onnx2Text, I convert the model into a text file then manually edit the required parts, e.g. change the cast from 'int64' to 'int32', change the input data type of ReduceSum to int32 and make sure its next nodes accept int32 inputs, etc., then finally convert the model from text file to onnx file.

For this safety checker model, it's easy to convert it to use int32, for others, I can't tell how hard it is. It depends on how complicated the model is, case by case.

@philloooo
Copy link
Contributor

@Honry @fdwr thanks!

thanks to @fdwr's tool: https://github.com/fdwr/Onnx2Text, I convert the model into a text file then manually edit the required parts, e.g. change the cast from 'int64' to 'int32', change the input data type of ReduceSum to int32 and make sure its next nodes accept int32 inputs, etc., then finally convert the model from text file to onnx file.

Is there a way to automate that process?
And offer such tools for developers who would like to achieve best browser compatibility with WebNN EP?

Copy link

@mwyrzykowski mwyrzykowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @a-sully and others who have raised concerns regarding adding additional 64-bit type support where it does not exist with CoreML.

We could potentially expose some general uint64/int64 extension or int64 on certain operations extension, but I don't think making it required and expecting backends to emulate is realistic.

@fdwr
Copy link
Collaborator

fdwr commented Sep 5, 2024

Is there a way to automate that process?

@philloooo: Well currently it's manual, and we may be able to write a tool that applies known patterns (like the reduceSum above), but it's not possible at the model level for all cases (e.g. ONNX ArgMax always outputs int64). There is also an expectation that any legal ONNX model (which could originate from anywhere, as there are myriad tools nowadays and exporters) works without any additional massaging, regardless of the OS or backend. So, seeing what the existing ORT CoreML execution provider does should be enlightening here. Though, it's prudent in any case to add guidance to the spec recommending int32/uint32 over int64/uint64 where possible for performance.

We could potentially expose some general uint64/int64 extension or int64 on certain operations extension, but I don't think making it required and expecting backends to emulate is realistic.

@mwyrzykowski: It's not required, as different backends will have different capabilities (tensor rank limits, data type support...), and the opSupportLimits is that extension which informs the caller the capabilities so it can respond accordingly. Note that although int64 usage is not found in CoreML models, it is rather ubiquitous for the ONNX format, and so supporting that now large existing corpus of models is important, at least until CoreML catches up with its Apple kin in data type support (BNNSDataTypeInt64, MPSDataType.int64, mx.int64). A very base level of emulation isn't as challenging as it initially sounds though, either in the CoreML backend or callers like ORT, seeing that:

  • ORT already has a functioning CoreML EP (which surely encounters ArgMax implemented here)
  • the only visible parts to the application are the inputs (which could be eagerly downcasted) and the outputs (and then upcast), meaning that within the graph, the backend could do whatever it wants. It doesn't matter whether Schrödinger's cat is alive or dead until actually observed, and it doesn't matter whether the internals are 32-bit or 64-bit until observed so long as the values are in range, for which the next bullet point is salient...
  • truncating int64 to int32 for index usage (argMin/argMax/gather/scatter/... which is the most common usage) is safe in practice because WebNN backends have tensor element count limits of 4 billion per tensor currently anyway.

So at a minimum, the one operation in ORT or CoreML backend that would be most useful to emulate is casting graph inputs/outputs. For example, to cast uint64 to uint32, either skip every other element in the ArrayBuffer before uploading to the GPU, or treat the uint64/int64 tensor input virtually (from CoreML's POV) as a uint32/int32 tensor with the lowest dimension doubled (because every 64-bit element is just 2 x 32-bit elements), and then in pseudocode¹...

func CastUint64ToUint32(uint64tensorAsDoubledUint32)
    collapsedDimensionCount = reduceProduct(uint64tensorAsDoubledUint32.dimensions)
    reshapedTensor = reshape(input, [collapsedDimensionCount / 2, 2])
    lowBitsUint32Tensor = slice(reshapedTensor, starts=[0,0], ends=[collapsedDimensionCount, 1])
    return lowBitsUint32Tensor
    // carry on as usual, using the uint32 tensor instead.
endif

¹ This is basically what the ORT DML EP used to do before native int64 support in DirectML.dll.

Copy link

@mwyrzykowski mwyrzykowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fdwr I think we should start with the set of operations which is the intersection of which all native frameworks support today, instead of expecting frameworks to catch up. In general, this seems like the wrong direction and will lead to fragmentation where certain models work on certain devices and not others.

@huningxin
Copy link
Contributor Author

huningxin commented Sep 7, 2024

@mwyrzykowski

@fdwr BNNS only runs on the CPU and MLX being open source can not use the ANE / NPU. MPS being backed by Metal only runs on the GPU.

If running on the ANE is a goal, CoreML is necessary.

I am trying to understand whether this is a device specific limits. Let's say if an implementation could map "cpu" MLContext to BNNS, "gpu" MLContext to MPS and "npu" MLContext to CoreML, would that mean only the "npu" MLContext not supporting 64bit integers? That device type specific difference can be detected through MLContext.opSupportLimits() interface.

As you know, the Chromium prototype on Windows also relies on different native frameworks, TFLite for "cpu" MLContext and DirectML for "gpu" and "npu" MLContext. IIUC, the DirectML supported data types are also device dependent and can be detected through DML_FEATURE_DATA_TENSOR_DATA_TYPE_SUPPORT.

@a-sully
Copy link
Contributor

a-sully commented Sep 9, 2024

I think we should start with the set of operations which is the intersection of which all native frameworks support today, instead of expecting frameworks to catch up. In general, this seems like the wrong direction and will lead to fragmentation where certain models work on certain devices and not others.

+1 to this sentiment. It would be nice if there was a common set of operators and data types which was guaranteed to be supported across platforms, which is what #573 is getting at. Then MLOpSupportLimits would be a way for backends to advertise their support for additional data types, and if the data type eventually becomes widely supported then it could be added to the common set...

...The question is whether a common set of data types exists. Unfortunately "float16" is not consistently supported on CPU and GPU backends, and"float32" is not supported on (most? all?) NPUs. So at least as long as MLDeviceType exists (and it may not for much longer #749) and each device type naively maps to "run exclusively on this compute unit", I don't think there exists a common set for most operators

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify the operand data type constraints of operation
7 participants