[DML] MatrixMultiplyIntegerToFloat #19608

raoanag · 2024-02-22T17:16:41Z

Description

DML Implementation for com.microsoft.MatMulIntegerToFloat

.\onnxruntime_test_all.exe --gtest_filter="*MatMulIntegerToFloat.*"
Note: Google Test filter = *MatMulIntegerToFloat.*
[==========] Running 22 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 22 tests from MatMulIntegerToFloat
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8 (620 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8 (497 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8 (488 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8S8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8S8 (503 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8U8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8U8 (495 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8U8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8U8 (488 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8U8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8U8 (492 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8X8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8X8 (502 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8U8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8U8 (452 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8U8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8U8 (454 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8U8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8U8 (446 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8U8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8U8 (508 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8S8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8S8 (456 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8S8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8S8 (455 ms)
[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8S8
[       OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8S8 (447 ms)
[ RUN      ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8S8
[       OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8S8 (465 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8U8
[       OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8U8 (111 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8S8
[       OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8S8 (115 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8S8
[       OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8S8 (114 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8U8
[       OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8U8 (110 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16
[       OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16 (112 ms)
[ RUN      ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint
[       OK ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint (337 ms)
[----------] 22 tests from MatMulIntegerToFloat (8679 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 1 test suite ran. (8680 ms total)
[  PASSED  ] 22 tests.
memleakdbg:
----- No memory leaks detected -----

Motivation and Context

CalculateMatMulIntegerToFloat to replace CPU EP run reference
Added more FP32 testcases to isolate all input datatype combinations
Added fixed input to MatMulIntegerToFloat_FP16* test cases as for FP16 test cases.
onnxruntime/test/testdata/matmul_integer_to_float.py` is capable of generating FP16 models, but we do not produce any for now

onnxruntime/test/testdata/transform/fusion/matmul_integer_to_float.py

[Cherry Pick Reviewed] Commit all MatrixMultiplyIntegerToFloat PRs [MatrixMultiplyIntegerToFloat (](https://github.com/microsoft/onnxruntime/pull/18275/commits/bf642a4d35691a13ff0ecef11cb8a9571c5a5610)https://github.com/microsoft/onnxruntime/pull/16804[)] [MatMulIntToFloat Enable FP16 and update tensor ORT-DML indexing (](https://github.com/microsoft/onnxruntime/pull/18275/commits/8237548d14f11a165a9b82bf181f8762e65f6142)https://github.com/microsoft/onnxruntime/pull/16871[)] [Disable MatMulIntegerToFloat transformation for FP16 on CPU EP (](https://github.com/microsoft/onnxruntime/pull/18275/commits/b16bf809dea31872ccb664f2622711966078e3f5)https://github.com/microsoft/onnxruntime/pull/18239[)]

### Description MatMulIntegerToFloat tests were noticed to be failing for DMLEP the root cause being inaccuracies in CPUEP implementation to some data type combinations. ``` .\onnxruntime_test_all.exe --gtest_filter="*MatMulIntegerToFloat.*" Note: Google Test filter = *MatMulIntegerToFloat.* [==========] Running 22 tests from 1 test suite. [----------] Global test environment set-up. [----------] 22 tests from MatMulIntegerToFloat [ RUN ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8 (620 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8 (497 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8 (488 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8S8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8S8 (503 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8U8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8U8 (495 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8U8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8U8 (488 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8U8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8U8 (492 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8X8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8X8 (502 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8U8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8U8 (452 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8U8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8U8 (454 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8U8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8U8 (446 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8U8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8U8 (508 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8S8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8S8 (456 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8S8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8S8 (455 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8S8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8S8 (447 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8S8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8S8 (465 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8U8 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8U8 (111 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8S8 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8S8 (115 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8S8 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8S8 (114 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8U8 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8U8 (110 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16 (112 ms) [ RUN ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint [ OK ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint (337 ms) [----------] 22 tests from MatMulIntegerToFloat (8679 ms total) [----------] Global test environment tear-down [==========] 22 tests from 1 test suite ran. (8680 ms total) [ PASSED ] 22 tests. memleakdbg: ----- No memory leaks detected ----- ``` ### Motivation and Context  * `CalculateMatMulIntegerToFloat` to replace CPU EP run reference * Added more FP32 testcases to isolate all input datatype combinations * Added fixed input to `MatMulIntegerToFloat_FP16*` test cases as for FP16 test cases. There is no support for direct onnxruntime::MLFloat16 datatype comparison with gtest framework. This leads to FP32 reference -> FP16 tensor -> FP32 reference conversion which is adding inaccuracies. ![image](https://github.com/microsoft/onnxruntime/assets/127366241/c6aaf68e-44df-42be-9860-df2cb0dd7a56) * Removing `MatMulIntegerToFloatHelper` as its same as `MatMulHelper` * onnxruntime/test/testdata/matmul_integer_to_float.py` is still capable of generating FP16 models, but we do not produce any for now

raoanag · 2024-03-01T19:27:02Z

/azp run Big Models, Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline

azure-pipelines · 2024-03-01T19:27:21Z

Azure Pipelines successfully started running 3 pipeline(s).

raoanag · 2024-03-01T19:40:31Z

/azp run Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, Windows ARM64 QNN CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

azure-pipelines · 2024-03-01T19:40:38Z

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

raoanag · 2024-03-01T19:41:44Z

/azp run Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, Windows ARM64 QNN CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline

azure-pipelines · 2024-03-01T19:42:20Z

Azure Pipelines successfully started running 9 pipeline(s).

raoanag · 2024-03-01T19:42:34Z

/azp run Windows x64 QNN CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

azure-pipelines · 2024-03-01T19:42:55Z

Azure Pipelines successfully started running 5 pipeline(s).

tianleiwu · 2024-03-05T05:28:38Z

@raoanag, please adjust the test threshold (always use abs error and not relative error?). Here is an example test failure:
NoZeroPoint_HasBias_test_U8S8
/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:272
The difference between cur_expected[i] and cur_actual[i] is 6.6161155700683594e-06, which exceeds *(params.relative_error) * std::abs(cur_expected[i]), where
cur_expected[i] evaluates to 0.00020684301853179932,
cur_actual[i] evaluates to 0.00020022690296173096, and
*(params.relative_error) * std::abs(cur_expected[i]) evaluates to 4.1368602978764102e-06.
i:211
Google Test trace:
/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:484: provider type: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/providers/base_tester.cc:791: registered execution providers: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1183722559

fs-eire · 2024-03-06T00:46:30Z

WebAssembly CI also fails on test "MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8".

build log link

[ RUN      ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8
/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:272: Failure
The difference between cur_expected[i] and cur_actual[i] is 1.9073486328125e-06, which exceeds *(params.relative_error) * std::abs(cur_expected[i]), where
cur_expected[i] evaluates to 1.9073486328125e-06,
cur_actual[i] evaluates to 0, and
*(params.relative_error) * std::abs(cur_expected[i]) evaluates to 3.8146971803598717e-08.
i:497
Google Test trace:
/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:484: provider type: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/providers/base_tester.cc:791: registered execution providers: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1298328783

/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:272: Failure
The difference between cur_expected[i] and cur_actual[i] is 1.9073486328125e-06, which exceeds *(params.relative_error) * std::abs(cur_expected[i]), where
cur_expected[i] evaluates to 1.9073486328125e-06,
cur_actual[i] evaluates to 0, and
*(params.relative_error) * std::abs(cur_expected[i]) evaluates to 3.8146971803598717e-08.
i:497
Google Test trace:
/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:484: provider type: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/providers/base_tester.cc:791: registered execution providers: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1298328783

/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:272: Failure
The difference between cur_expected[i] and cur_actual[i] is 1.3113021850585938e-06, which exceeds *(params.relative_error) * std::abs(cur_expected[i]), where
cur_expected[i] evaluates to 1.3113021850585938e-06,
cur_actual[i] evaluates to -0, and
*(params.relative_error) * std::abs(cur_expected[i]) evaluates to 2.6226043559063328e-08.
i:497
Google Test trace:
/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:484: provider type: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/providers/base_tester.cc:791: registered execution providers: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1298328783

/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:272: Failure
The difference between cur_expected[i] and cur_actual[i] is 1.3113021850585938e-06, which exceeds *(params.relative_error) * std::abs(cur_expected[i]), where
cur_expected[i] evaluates to 1.3113021850585938e-06,
cur_actual[i] evaluates to -0, and
*(params.relative_error) * std::abs(cur_expected[i]) evaluates to 2.6226043559063328e-08.
i:497
Google Test trace:
/mnt/vss/_work/1/s/onnxruntime/test/providers/checkers.cc:484: provider type: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/providers/base_tester.cc:791: registered execution providers: CPUExecutionProvider
/mnt/vss/_work/1/s/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1298328783

[  FAILED  ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8 (7 ms)

### Description Check float/double/float16/bfloat16 tensors are close like [numpy.isclose](https://numpy.org/doc/stable/reference/generated/numpy.isclose.html). ``` absolute(a - b) <= (atol + rtol * absolute(b)) ``` The default tolerance thresholds: - float: atol=1e-5 and rtol=1e-4 - float16: atol=0.0025 and rtol=0.001 - bfloat16: atol=0.02 and rtol=0.01 ### Motivation and Context Current pipeline has frequent failure due to using only relative tolerance in #19608: [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8S8 1: C:\a\_work\1\s\onnxruntime\test\providers\checkers.cc(272): error: The difference between cur_expected[i] and cur_actual[i] is 1.3113021850585938e-06, which exceeds *(params.relative_error) * std::abs(cur_expected[i]), where 1: cur_expected[i] evaluates to -1.3113021850585938e-06, 1: cur_actual[i] evaluates to 0, and 1: *(params.relative_error) * std::abs(cur_expected[i]) evaluates to 2.6226043559063328e-08. It is not reasonable to use relative tolerance for a small value very close to 0. Combining relative tolerance with a positive absolute tolerance could avoid such issue.

### Description DML Implementation for [com.microsoft.MatMulIntegerToFloat](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.MatMulIntegerToFloat) ``` .\onnxruntime_test_all.exe --gtest_filter="*MatMulIntegerToFloat.*" Note: Google Test filter = *MatMulIntegerToFloat.* [==========] Running 22 tests from 1 test suite. [----------] Global test environment set-up. [----------] 22 tests from MatMulIntegerToFloat [ RUN ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8S8 (620 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8S8 (497 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8S8 (488 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8S8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8S8 (503 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8U8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8U8 (495 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8U8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8U8 (488 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8U8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8U8 (492 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8X8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8X8 (502 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8U8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_S8U8 (452 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8U8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_S8U8 (454 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8U8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_S8U8 (446 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8U8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_S8U8 (508 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8S8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_NoBias_test_U8S8 (456 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8S8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_HasBias_test_U8S8 (455 ms) [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8S8 [ OK ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8S8 (447 ms) [ RUN ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8S8 [ OK ] MatMulIntegerToFloat.HasZeroPoint_HasBias_test_U8S8 (465 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8U8 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8U8 (111 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8S8 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_U8S8 (115 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8S8 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8S8 (114 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8U8 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16_S8U8 (110 ms) [ RUN ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16 [ OK ] MatMulIntegerToFloat.MatMulIntegerToFloat_FP16 (112 ms) [ RUN ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint [ OK ] MatMulIntegerToFloat.MatMulInteger_With_ZeroPoint (337 ms) [----------] 22 tests from MatMulIntegerToFloat (8679 ms total) [----------] Global test environment tear-down [==========] 22 tests from 1 test suite ran. (8680 ms total) [ PASSED ] 22 tests. memleakdbg: ----- No memory leaks detected ----- ``` ### Motivation and Context  * `CalculateMatMulIntegerToFloat` to replace CPU EP run reference * Added more FP32 testcases to isolate all input datatype combinations * Added fixed input to `MatMulIntegerToFloat_FP16*` test cases as for FP16 test cases. * onnxruntime/test/testdata/matmul_integer_to_float.py` is capable of generating FP16 models, but we do not produce any for now

### Description Check float/double/float16/bfloat16 tensors are close like [numpy.isclose](https://numpy.org/doc/stable/reference/generated/numpy.isclose.html). ``` absolute(a - b) <= (atol + rtol * absolute(b)) ``` The default tolerance thresholds: - float: atol=1e-5 and rtol=1e-4 - float16: atol=0.0025 and rtol=0.001 - bfloat16: atol=0.02 and rtol=0.01 ### Motivation and Context Current pipeline has frequent failure due to using only relative tolerance in microsoft#19608: [ RUN ] MatMulIntegerToFloat.NoZeroPoint_NoBias_test_U8S8 1: C:\a\_work\1\s\onnxruntime\test\providers\checkers.cc(272): error: The difference between cur_expected[i] and cur_actual[i] is 1.3113021850585938e-06, which exceeds *(params.relative_error) * std::abs(cur_expected[i]), where 1: cur_expected[i] evaluates to -1.3113021850585938e-06, 1: cur_actual[i] evaluates to 0, and 1: *(params.relative_error) * std::abs(cur_expected[i]) evaluates to 2.6226043559063328e-08. It is not reasonable to use relative tolerance for a small value very close to 0. Combining relative tolerance with a positive absolute tolerance could avoid such issue.

github-advanced-security bot found potential problems Feb 22, 2024

View reviewed changes

onnxruntime/test/testdata/transform/fusion/matmul_integer_to_float.py Fixed Show fixed Hide fixed

raoanag force-pushed the user/anarao/DMLEPMatMulInt2Flt branch 5 times, most recently from ce6aa75 to 3cec30a Compare February 23, 2024 19:10

raoanag marked this pull request as ready for review February 23, 2024 20:09

raoanag requested a review from a team as a code owner February 23, 2024 20:09

raoanag requested review from yufenglee and tbqh February 23, 2024 20:09

raoanag and others added 6 commits February 27, 2024 12:11

Doc updates

9cceffa

Resolve conflicts

1c74a29

Lint runner

88f988e

adding back 120 character

795241c

raoanag force-pushed the user/anarao/DMLEPMatMulInt2Flt branch from 139236a to ddf8f78 Compare February 27, 2024 21:28

Linx Build fix

6fe223c

raoanag force-pushed the user/anarao/DMLEPMatMulInt2Flt branch from ddf8f78 to 6fe223c Compare February 28, 2024 00:41

tbqh previously approved these changes Feb 28, 2024

View reviewed changes

yufenglee previously approved these changes Feb 29, 2024

View reviewed changes

update constexpr Linix build error

a4c8158

raoanag dismissed stale reviews from yufenglee and tbqh via a4c8158 March 1, 2024 20:17

raoanag added 2 commits March 1, 2024 14:46

Update tolerance

577706f

Increase tolerance for CPU

66c21b2

yufenglee approved these changes Mar 4, 2024

View reviewed changes

raoanag merged commit 27b1dc9 into main Mar 4, 2024
93 of 95 checks passed

raoanag deleted the user/anarao/DMLEPMatMulInt2Flt branch March 4, 2024 19:55

tianleiwu mentioned this pull request Mar 6, 2024

Update tolerance of provider tests to fix flaky tests #19792

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DML] MatrixMultiplyIntegerToFloat #19608

[DML] MatrixMultiplyIntegerToFloat #19608

raoanag commented Feb 22, 2024

raoanag commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

raoanag commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

raoanag commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

raoanag commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

tianleiwu commented Mar 5, 2024 •

edited

Loading

fs-eire commented Mar 6, 2024

[DML] MatrixMultiplyIntegerToFloat #19608

[DML] MatrixMultiplyIntegerToFloat #19608

Conversation

raoanag commented Feb 22, 2024

Description

Motivation and Context

raoanag commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

raoanag commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

raoanag commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

raoanag commented Mar 1, 2024

azure-pipelines bot commented Mar 1, 2024

tianleiwu commented Mar 5, 2024 • edited Loading

fs-eire commented Mar 6, 2024

tianleiwu commented Mar 5, 2024 •

edited

Loading