Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the operand data type constraints of operation #283

Closed
huningxin opened this issue Aug 26, 2022 · 22 comments · Fixed by #646 · May be fixed by #695
Closed

Specify the operand data type constraints of operation #283

huningxin opened this issue Aug 26, 2022 · 22 comments · Fixed by #646 · May be fixed by #695
Assignees
Labels

Comments

@huningxin
Copy link
Contributor

huningxin commented Aug 26, 2022

The current spec doesn't specify the operand type constraints of an operation. However, some operations, e.g., softmax should only support float32 operand type according to the survey of frameworks and native ML APIs in the following table. The lack of the operand type constraints specification would lead to implementation issue, such as Chromium CL 3856752.

Framework / API supported data types
TensorFlow tf.nn.softmax half, float32, float64
ONNX Softmax tensor(float16), tensor(float), tensor(double), tensor(bfloat16)
NNAPI ANEURALNETWORKS_SOFTMAX FLOAT32, FLOAT16
DirectML DML_ACTIVATION_SOFTMAX_OPERATOR_DESC FLOAT32, FLOAT16
BNNS ActivationFunction float

Thanks @wacky6 for pointing this out.

@huningxin
Copy link
Contributor Author

There are other operations should constrain the operand to floating-point types, e.g., batchNormalization, elu, hardSigmoid, hardSwish, instanceNormalization, leakyRelu, linear, sigmoid, softplus, softsign, tanh.

@huningxin huningxin changed the title Softmax should only support input of floating-point types Specify the operand type constraints of operation Sep 21, 2022
@huningxin
Copy link
Contributor Author

huningxin commented Apr 11, 2023

Also for prelu op, it should also only accept floating-point types. It was raised by @wacky6 in the Chromium CL review. Thanks again Jiewei!

@huningxin
Copy link
Contributor Author

huningxin commented Apr 19, 2023

For some element-wise unary ops, ceil and floor should only accept floating-point types. abs and neg should only accept signed types, including signed integer and floating-point types. This feedback is from Chromium CL review. Thanks @wacky6 and @miaobin!

Additionally, cos, exp, log, sin and tan should accept floating-point types.

@huningxin
Copy link
Contributor Author

(Feedback raised by @wacky6 from Chromium CL review)

For reduction ops, reduceL2, reduceLogSum, reduceLogSumExp and reduceMean should only accept floating-point types.

/cc @lisa0314 @fdwr

@fdwr
Copy link
Collaborator

fdwr commented Oct 25, 2023

reduceL2, reduceLogSum, reduceLogSumExp and reduceMean should only accept floating-point types.

That's consistent with DML's data type support too:

REDUCE:
  ARGMIN, ARGMAX:
    featureLevel: 4.1
    InputDataType:  uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32
    OutputDataType: uint32, uint64, int32, int64
    MinRank: 1
    MaxRank: 8
  AVERAGE, L2, LOG_SUM, LOG_SUM_EXP:
    featureLevel: 3.0
    InputDataType:  float16, float32
    OutputDataType: float16, float32
    MinRank: 1
    MaxRank: 8
  L1, SUM_SQUARE:
    featureLevel: 5.0
    InputDataType:  uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32
    OutputDataType: uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32
    MinRank: 1
    MaxRank: 8
  MIN, MaxRank:
    featureLevel: 5.0
    InputDataType:  uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32
    OutputDataType: uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32
    MinRank: 1
    MaxRank: 8
  MULTIPLY, SUM:
    featureLevel: 5.0
    InputDataType:  uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32
    OutputDataType: uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32
    MinRank: 1
    MaxRank: 8

@wacky6
Copy link

wacky6 commented Oct 26, 2023

Just curious, what's the expected behavior for integer overflows for MULTIPLY and SUM?

I guess float overflow should just become Infinity.

@fdwr
Copy link
Collaborator

fdwr commented Oct 26, 2023

@wacky6

Just curious, what's the expected behavior for integer overflows for MULTIPLY and SUM?

I recall surveying a number of libraries a while back for integers, and they all did two's complement wrap rather than saturate. That is what DML does. I presume XNNPack too?

e.g.

import numpy

x = numpy.array(200, dtype=numpy.uint8)
y = numpy.add(x, x)
print("value:", y)
print("shape:", y.shape)

# Prints:
# value: 144
# shape: ()

I guess float overflow should just become Infinity.

👍

@huningxin
Copy link
Contributor Author

huningxin commented Nov 20, 2023

[WIP] Summary of the operand data type constraints for current WebNN operations.

argMin/argMax

batchNormalization

clamp

concat

conv2d

convTranspose2d

Element-wise binary operations

add

  • a: all supported data types
  • b: same as a
  • output: same as a

sub

  • a: all supported data types
  • b: same as a
  • output: same as a

mul

  • a: all supported data types
  • b: same as a
  • output: same as a

div

  • a: all supported data types
  • b: same as a
  • output: same as a

min

  • a: all supported data types
  • b: same as a
  • output: same as a

min

  • a: all supported data types
  • b: same as a
  • output: same as a

pow

  • a: all supported data types
  • b: same as a
  • output: same as a

Element-wise unary operations

abs

ceil

cos

exp

floor

log

neg

sin

tan

elu

expand

  • input: all supported data types
  • output: same as input

gather

gelu

gemm

gru

gruCell

hardSigmoid

hardSwish

instanceNormalization

layerNormalization

leakyRelu

linear

lstm

lstmCell

matmul

pad

  • input: all supported data types
  • output: same as input

Pooling operations

averagePool2d

l2Pool2d

maxPool2d

  • input: all supported data types
  • output: same as input

prelu

Reduction operations

reduceL1

reduceL2

reduceLogSum

reduceLogSumExp

reduceMax

  • input: all supported data types
  • output: same as input

reduceMean

reduceMin

  • input: all supported data types
  • output: same as input

reduceProduct

reduceSum

reduceSumSquare

relu

resample2d

reshape

  • input: all supported data types
  • output: same as input

sigmoid

slice

  • input: all supported data types
  • output: same as input

softmax

softplus

softsign

split

transpose

  • input: all supported data types
  • output: same as input

triangular

  • input: all supported data types
  • output: same as input

where

@huningxin
Copy link
Contributor Author

@fdwr , if I read the DML doc correctly, L1, SUM_SQUARE, MULTIPLY, SUM reduce functions only support 32 and 64 bit integers.

https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_reduce_operator_desc

@fdwr
Copy link
Collaborator

fdwr commented Nov 23, 2023

@fdwr , if I read the DML doc correctly, L1, SUM_SQUARE, MULTIPLY, SUM reduce functions only support 32 and 64 bit integers.

https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_reduce_operator_desc

@huningxin : Correct.

Summary of the operand data type constraints for current WebNN operations

Is this the intersection of DML with XNNPack? (because FL3+ DML ABS supports int16 too, and FL4.1+ supports int8/int16/int32/int64).

@huningxin huningxin changed the title Specify the operand type constraints of operation Specify the operand data type constraints of operation Jan 31, 2024
@inexorabletash
Copy link
Member

inexorabletash commented Apr 9, 2024

@huningxin in the list above, when an op is listed like this:

batchNormalization
input: float32, float16
mean: same as input

Should that "same as input" statement be interpreted as:
(1) mean's data type has the same restriction as input (i.e. float32 or float16); or
(2) the given mean operand's data type must exactly match the given input operantd's data type

I assume the latter (based on other examples) but wanted to confirm.

@fdwr
Copy link
Collaborator

fdwr commented Apr 9, 2024

Should that "same as input" statement be interpreted as: (1) mean's data type has the same restriction as input (i.e. float32 or float16); or (2) the given_mean_ operand's data type must exactly match the given input operantd's data type

@inexorabletash Yep, latter. If input is float32, then mean must also be float32.

@inexorabletash
Copy link
Member

I have a local change for this. Lots of copy/paste. I'm wondering if we want to be table-driven; I'll bring that up in the eventual PR.

The table is missing:

  • softmax but it kicked off the issue as "float32" and "float16" only
  • softplus, softsign, tanh are mentioned in a comment as floating-point only
  • gelu
  • layerNormalization
  • gru, gruCell
  • lstm, lstmCell
  • argmin/argmax (presumably anything, like most of the reduction ops)
  • expand, gather, slice, transpose, triangular (presumably anything)
  • where (already in the spec, condition must be uint8, other other must match input)

@wacky6
Copy link

wacky6 commented Apr 10, 2024

Should mixed precision be allowed when the op involves accumulation?

For example, would this be acceptable for conv2d (likewise, matmul, reduceSum, ...)?

conv2d(
   /* input */ fp16,
  /* weight */ fp16,
    /* bias */ fp16,
) => fp32

I think requiring input, weight and bias to be the same type is reasonable (caller shouldn't add/multiple matrices of different types.

We don't need to block on mixed precision topics (because it's relaxing the constraints). A well documented type constraint table will be a big spec improvement. 😄


My read of DML doc is that conv(fp16) -> fp32 isn't permitted.

Does DML use a fp32 accumulator internally, then casts the result to fp16? Or is it fp16 accumulation all the way (might saturate the range and yield Infinity / NaN) ?

@fdwr

@fdwr
Copy link
Collaborator

fdwr commented Apr 10, 2024

My read of DML doc is that conv(fp16) -> fp32 isn't permitted.

@wacky6 You are correct - DML almost always requires the input and output data types to be the same (some notable exceptions are cast and quantize/dequantize operators).

Does DML use a fp32 accumulator internally, then casts the result to fp16? Or is it fp16 accumulation all the way (might saturate the range and yield Infinity / NaN)?

It depends on the op (reduction needs more intermediate precision than simple addition) and which flags are passed (e.g. DML_EXECUTION_FLAG_ALLOW_HALF_PRECISION_COMPUTATION which can be faster on some devices, at the cost of precision).

caller shouldn't add/multiple matrices of different types

I generally agree, as mixing any multiple different input types would explode the test matrix and increase backend complexity.

@huningxin
Copy link
Contributor Author

@inexorabletash

The table is missing:

Added the following ops into the table.

+1

  • softplus, softsign, tanh are mentioned in a comment as floating-point only

+1

  • gelu

floating-point only

  • layerNormalization

floating-point only

  • gru, gruCell

floating-point only

  • lstm, lstmCell

floating-point only

  • argmin/argmax (presumably anything, like most of the reduction ops)

argmin/argmax's output is int64

  • expand, gather, slice, transpose, triangular (presumably anything)

gather's indices is uint32 or int64

  • where (already in the spec, condition must be uint8, other other must match input)

+1

@inexorabletash inexorabletash self-assigned this Apr 10, 2024
@philloooo
Copy link
Contributor

philloooo commented Apr 11, 2024

Hi,
I've compared it with what CoreML support, the main difference is:

  • For almost all the ops that support all types, CoreML support fp32, fp16, int32.
  • Argmax/argmin output is int32
  • Scalar parameters, like gather indices, conv2d groups type is int32.
  • For floating points computation, NPU only supports fp16. CPU & GPU supports both fp32 and fp16.
  • Overal Model input/output: fp16, fp32, int32. So if the last op of the model generates int8, it errors out.

If WebNN declares wider type set, I think we would need some way to feature detect(#463).

inexorabletash added a commit to inexorabletash/webnn that referenced this issue Apr 18, 2024
Introduce constraints for input operands, either directly (e.g.
input's dataType can only be "float32" or "float16") or indirectly
(e.g. weight's dataType must be the same as input's).

Fixes webmachinelearning#283
inexorabletash added a commit to inexorabletash/webnn that referenced this issue Apr 18, 2024
Introduce constraints for input operands, either directly (e.g.
input's dataType can only be "float32" or "float16") or indirectly
(e.g. weight's dataType must be the same as input's).

Fixes webmachinelearning#283
@fdwr fdwr closed this as completed in #646 Apr 26, 2024
fdwr added a commit that referenced this issue Apr 26, 2024
* Specify the operand data type constraints of operations

Introduce constraints for input operands, either directly (e.g.
input's dataType can only be "float32" or "float16") or indirectly
(e.g. weight's dataType must be the same as input's).

Fixes #283

* gruCell: bundle hiddenState in with other type validations

* Identity should accept all types

* Add reduceMean restriction

* Update gemm to check c data type too

---------

Co-authored-by: Dwayne Robinson <[email protected]>
@fdwr
Copy link
Collaborator

fdwr commented May 2, 2024

For almost all the ops that support all types, CoreML support fp32, fp64, int32.

@philloooo The float64 type is interesting because in contrast, DML doesn't support float64 for any math operators (only for bitwise data movement operators like transposing and gather and bitwise and/or, plus cast at the edges of partitions) but DML does support int64, whereas CoreML supports float64 but doesn't support int64 for any operators. 🤹

If WebNN declares wider type set, I think we would need some way to feature detect(#463).

Agreed. Differences across implementations will be inevitable (similarly, WebGPU doesn't support all GPUTextureFormats, like astc-4x4-unorm across all GPU's), but if they can be testable up-front, that avoids wastefully loading a model only to fail later, and if any differences occur broadly (e.g. entire data types missing), that's easier to deal with than sporadic differences (certain operators missing) because you could easily generate two separate models with different data types, but it would be tougher to generate multiple models depending on various permutations of spotty operator support. The MLContext is probably the right place to ask that before you even make a graphBuilder. e.g.

interface MLContext {
    Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs, MLNamedArrayBufferViews outputs);

+   boolean isTypeSupported(MLOperandDataType dataType);
};

@philloooo
Copy link
Contributor

oops @fdwr sorry that was a typo, it's fp16 not fp64. Just edited. For almost all the ops that support all types, CoreML support fp32, fp16, int32.

@inexorabletash
Copy link
Member

I know it's bad form to comment on a closed issue, but... the table lists this for ReLU and PReLU:

  • input: float32, float16, int32, int8

@philloooo points out that most other activations support only floats, and #283 (comment) says that prelu should only accept floats?

@huningxin - is this intentional?

@huningxin
Copy link
Contributor Author

@inexorabletash

@huningxin - is this intentional?

Yes. The intention is having them to accept the signed values.

#283 (comment) says that prelu should only accept floats?

In Chromium CL review, we'd like to prototype prelu by staring from floating point data point in sake of simplicity.

@huningxin
Copy link
Contributor Author

@fdwr , if I read the DML doc correctly, L1, SUM_SQUARE, MULTIPLY, SUM reduce functions only support 32 and 64 bit integers.

https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_reduce_operator_desc

Added missing int64 and uint64 support of reduceL1, reduceSum, reduceProduct and reduceSumSquare into above table.

huningxin added a commit to huningxin/webnn that referenced this issue May 27, 2024
`reduceL1`, `reduceProduct`, `reduceSum` and `reduceSumSquare` support
already 32-bit integers. 64-bit integers should also be supported.

Fix webmachinelearning#283, webmachinelearning#694
huningxin added a commit to huningxin/webnn that referenced this issue May 27, 2024
`reduceL1`, `reduceProduct`, `reduceSum` and `reduceSumSquare` already
support 32-bit integers. 64-bit integers should also be supported.

Fix webmachinelearning#283, webmachinelearning#694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment