Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Migraphx EP] Static int8 QDQ support #17931

Merged
merged 33 commits into from
Nov 9, 2023

Conversation

TedThemistokleous
Copy link
Contributor

Description

Adding static int8 quantization support for MIGraphX Execution Provider

  • Allows for parsing in calibration tables generated by Onnxruntime or TensorRT's toolsets
  • Add proper environment variables into the MIGraphX EP
  • Update python API to include updating execution provider flags -> was missing on python side
  • Hook into MIGraphX's int8 quantitation and optimization of models

Motivation and Context

Required so that we can get onnxruntime to pass in models while leveraging the existing tooling for int8 static QDQ quantization.

First step in a series of PRs which will add further static quantization on the operator level as MIGraphX releases further support.

These changes drew heavily from the tensorRT EP should allow for similar functionality for GPU based (versus CPU) quantization of models before an inference is performed.

@TedThemistokleous
Copy link
Contributor Author

@cloudhan @PeixuanZuo

Keeping this in draft until I can get some models quantized and run through your end to end examples found here: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization

Let me know if you have any further desired pieces I should be adding/taking into consideration.

I shall be uploading the finished end to end examples to the onnxruntime-inference-example repo as well once completed.

@TedThemistokleous
Copy link
Contributor Author

@cloudhan @PeixuanZuo can you kick off CI for me? We're looking to get support of this out with our release.

I've confirmed functionality with the tagged end to end example of resnet50.

@cloudhan

This comment was marked as outdated.

@azure-pipelines

This comment was marked as outdated.

@TedThemistokleous
Copy link
Contributor Author

@cloudhan can you enable CI again?

@cloudhan

This comment was marked as outdated.

@azure-pipelines

This comment was marked as outdated.

@cloudhan

This comment was marked as outdated.

@azure-pipelines

This comment was marked as outdated.

@cloudhan
Copy link
Contributor

@TedThemistokleous FYI, for quick local development iteration to test the CI, You can grab the CI command from other PRs
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1173339&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=66420422-c7d6-5f71-625c-4b7851c9b9ba&l=30. Modify it and make it suitable for local test.

@TedThemistokleous
Copy link
Contributor Author

TedThemistokleous commented Oct 19, 2023

@TedThemistokleous FYI, for quick local development iteration to test the CI, You can grab the CI command from other PRs https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1173339&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=66420422-c7d6-5f71-625c-4b7851c9b9ba&l=30. Modify it and make it suitable for local test.

python tools/ci_build/build.py --config Release --enable_training --cmake_extra_defines CMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ onnxruntime_BUILD_KERNEL_EXPLORER=OFF onnxruntime_USE_COMPOSABLE_KERNEL=OFF --mpi_home /opt/ompi --use_migraphx --rocm_version=5.7 --rocm_home /opt/rocm --nccl_home /opt/rocm --update --build_dir /build --build --parallel --build_wheel --skip_submodule_sync --use_cache --skip_tests --cmake_path /usr/bin/cmake --ctest_path /usr/bin/ctest

Does that also run local linting? That's what seems to be failing here. The training error I'm not seeing. We're currently using the following for builds

./build.sh --config Release --cmake_extra_defines CMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ --update --build --build_wheel --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --skip_tests --rocm_home /opt/rocm --use_migraphx --migraphx_home /opt/rocm --rocm_version=`cat /opt/rocm/.info/version-dev` --allow_running_as_root

@TedThemistokleous
Copy link
Contributor Author

@cloudhan @ytaous need another kick for CI

@cloudhan

This comment was marked as outdated.

@azure-pipelines

This comment was marked as outdated.

@cloudhan
Copy link
Contributor

/azp run Linux MIGraphX CI Pipeline,orttraining-amd-gpu-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@TedThemistokleous
Copy link
Contributor Author

@cloudhan there's nonsensical answers I'm getting from your linter:

810  814 |                           << ". Please reference
    810      |-
https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
    811      |-to ensure all dependencies are met.";
         815 |+                             https :  // onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
         816 |+to ensure all dependencies are met.";

Running lintrunner on onnxruntime root dir gives me the following

Warning: Could not find a lintrunner config at: '.lintrunner.private.toml'. Continuing without using configuration file.
WARNING: No previous init data found. If this is the first time you're running lintrunner, you should run `lintrunner init`.
  RUFF failure
  BLACK-ISORT failure
  PYLINT success!
  RUSTFMT success!
  CLANGFORMAT failure


>>> General linter failure:

  Error (RUFF) Linter failed
    Linter failed. This a bug, please file an issue against the linter
    maintainer.
    
    CONTEXT:
    Failed to execute linter command python with args: ["-m",
    "lintrunner_adapters", "run", "ruff_linter", "--config=pyproject.toml",
    "@/tmp/.tmptFQL1X"]
  Error (CLANGFORMAT) Linter failed
    Linter failed. This a bug, please file an issue against the linter
    maintainer.
    
    CONTEXT:
    Failed to execute linter command python with args: ["-m",
    "lintrunner_adapters", "run", "clangformat_linter", "--binary=clang-
    format", "--fallback", "--", "@/tmp/.tmp2KvuTb"]
  Error (BLACK-ISORT) Linter failed
    Linter failed. This a bug, please file an issue against the linter
    maintainer.
    
    CONTEXT:
    Failed to execute linter command python with args: ["-m",
    "lintrunner_adapters", "run", "black_isort_linter", "--", "@/
    tmp/.tmp2PxX1i"]

@cloudhan
Copy link
Contributor

@TedThemistokleous The ci log is weird because it stripped all whitespace changes from the log, so the log itself make no sense at all. All you need is just run clang-format on that file and it should be automatically formatted.

For lintrunner, you also need lintrunner_adapters.

pip install lintrunner lintrunner_adapters and lintrunner <the file> instead of the whole repo will be more helpful.

@cloudhan
Copy link
Contributor

For example lintrunner include/onnxruntime/core/session/onnxruntime_c_api.h
image

lintrunner -a include/onnxruntime/core/session/onnxruntime_c_api.h will automatically fix it for you:

> git diff                                                                                                                                                                                               (base) 
diff --git a/include/onnxruntime/core/session/onnxruntime_c_api.h b/include/onnxruntime/core/session/onnxruntime_c_api.h
index c44b941c13..8f22f2f75e 100644
--- a/include/onnxruntime/core/session/onnxruntime_c_api.h
+++ b/include/onnxruntime/core/session/onnxruntime_c_api.h
@@ -598,12 +598,11 @@ typedef struct OrtTensorRTProviderOptions {
  * \see OrtApi::SessionOptionsAppendExecutionProvider_MIGraphX
  */
 typedef struct OrtMIGraphXProviderOptions {
-
-  int device_id;             // hip device id.
-  int migraphx_fp16_enable;  // enable MIGraphX FP16 precision. Default 0 = false, nonzero = true
-  int migraphx_int8_enable;  // enable MIGraphX INT8 precision. Default 0 = false, nonzero = true
-  int migraphx_use_native_calibration_table;          // MIGraphx INT8 cal table. Default 0 = false, noznero = true
-  const char* migraphx_int8_calibration_table_name;   // MIGraphx INT8 calibration table name
+  int device_id;                                     // hip device id.
+  int migraphx_fp16_enable;                          // enable MIGraphX FP16 precision. Default 0 = false, nonzero = true
+  int migraphx_int8_enable;                          // enable MIGraphX INT8 precision. Default 0 = false, nonzero = true
+  int migraphx_use_native_calibration_table;         // MIGraphx INT8 cal table. Default 0 = false, noznero = true
+  const char* migraphx_int8_calibration_table_name;  // MIGraphx INT8 calibration table name
 } OrtMIGraphXProviderOptions;
 
 /** \brief OpenVINO Provider Options

@TedThemistokleous
Copy link
Contributor Author

For example lintrunner include/onnxruntime/core/session/onnxruntime_c_api.h image

lintrunner -a include/onnxruntime/core/session/onnxruntime_c_api.h will automatically fix it for you:

> git diff                                                                                                                                                                                               (base) 
diff --git a/include/onnxruntime/core/session/onnxruntime_c_api.h b/include/onnxruntime/core/session/onnxruntime_c_api.h
index c44b941c13..8f22f2f75e 100644
--- a/include/onnxruntime/core/session/onnxruntime_c_api.h
+++ b/include/onnxruntime/core/session/onnxruntime_c_api.h
@@ -598,12 +598,11 @@ typedef struct OrtTensorRTProviderOptions {
  * \see OrtApi::SessionOptionsAppendExecutionProvider_MIGraphX
  */
 typedef struct OrtMIGraphXProviderOptions {
-
-  int device_id;             // hip device id.
-  int migraphx_fp16_enable;  // enable MIGraphX FP16 precision. Default 0 = false, nonzero = true
-  int migraphx_int8_enable;  // enable MIGraphX INT8 precision. Default 0 = false, nonzero = true
-  int migraphx_use_native_calibration_table;          // MIGraphx INT8 cal table. Default 0 = false, noznero = true
-  const char* migraphx_int8_calibration_table_name;   // MIGraphx INT8 calibration table name
+  int device_id;                                     // hip device id.
+  int migraphx_fp16_enable;                          // enable MIGraphX FP16 precision. Default 0 = false, nonzero = true
+  int migraphx_int8_enable;                          // enable MIGraphX INT8 precision. Default 0 = false, nonzero = true
+  int migraphx_use_native_calibration_table;         // MIGraphx INT8 cal table. Default 0 = false, noznero = true
+  const char* migraphx_int8_calibration_table_name;  // MIGraphx INT8 calibration table name
 } OrtMIGraphXProviderOptions;
 
 /** \brief OpenVINO Provider Options

Looks like there's an assumption here with python versions. I got it sorted and reran everything

@TedThemistokleous TedThemistokleous changed the title Migraphx Static int8 QDQ support [Migraphx EP] Static int8 QDQ support Oct 23, 2023
@cloudhan
Copy link
Contributor

@justinchuby Is it possible to set the Lint pipeline to automatically run without needing member approve?

Ted Themistokleous added 2 commits November 6, 2023 20:02
Changed during format for line length errors/lintrunner. Added delimiters
@TedThemistokleous
Copy link
Contributor Author

TedThemistokleous commented Nov 6, 2023

@cloudhan /@faxu rebased changes off main since the eigen issue was breaking local builds.

@faxu
Copy link
Contributor

faxu commented Nov 6, 2023

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline

@faxu
Copy link
Contributor

faxu commented Nov 6, 2023

/azp run onnxruntime-python-checks-ci-pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed,
ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline, Linux QNN CI Pipeline, Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@faxu
Copy link
Contributor

faxu commented Nov 7, 2023

/azp run Windows ARM64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@PeixuanZuo
Copy link
Contributor

/azp run Linux MIGraphX CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@PeixuanZuo
Copy link
Contributor

@TedThemistokleous , could you please take a look about failed lint c++ CI?

@jeffdaily
Copy link
Contributor

@TedThemistokleous , could you please take a look about failed lint c++ CI?

@PeixuanZuo there is something strange about linting. For some reason our ROCm-related PRs flag far more linting issues than other non-ROCm PRs. It really hampers how quickly we can land PRs. For example, it flags include/onnxruntime/core/session/onnxruntime_c_api.h:603 which is inside the struct OrtMIGraphXProviderOptions, but looking 10 lines above inside the struct OrtTensorRTProviderOptions there are 3 lines over 120 long. It's only flagging our newly added lines. Are we being held to a higher standard than other EPs?

jeffdaily-ms
jeffdaily-ms previously approved these changes Nov 8, 2023
Fix errors screaming in lint C++ pass
Run another lintrunner pass just incase we get conflicts again
@PeixuanZuo
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@PeixuanZuo
Copy link
Contributor

/azp run onnxruntime-python-checks-ci-pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed,
ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline, Linux QNN CI Pipeline, Windows x64 QNN CI Pipeline, Windows ARM64 QNN CI Pipeline, Linux MIGraphX CI Pipeline

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@PeixuanZuo
Copy link
Contributor

@TedThemistokleous , could you please take a look about failed lint c++ CI?

@PeixuanZuo there is something strange about linting. For some reason our ROCm-related PRs flag far more linting issues than other non-ROCm PRs. It really hampers how quickly we can land PRs. For example, it flags include/onnxruntime/core/session/onnxruntime_c_api.h:603 which is inside the struct OrtMIGraphXProviderOptions, but looking 10 lines above inside the struct OrtTensorRTProviderOptions there are 3 lines over 120 long. It's only flagging our newly added lines. Are we being held to a higher standard than other EPs?

Lint CI doesn’t treat one EP as special. It runs lint on this PR instead of running lint on all code in the code base, so it only shows warnings about newly added lines.

@PeixuanZuo PeixuanZuo merged commit 8d50313 into microsoft:main Nov 9, 2023
69 checks passed
@TedThemistokleous TedThemistokleous deleted the migraphx_int8_support branch November 9, 2023 14:49
TedThemistokleous added a commit to ROCm/onnxruntime that referenced this pull request Nov 10, 2023
### Description
<!-- Describe your changes. -->
Adding static int8 quantization support for MIGraphX Execution Provider

- Allows for parsing in calibration tables generated by Onnxruntime or
TensorRT's toolsets
- Add proper environment variables into the MIGraphX EP
- Update python API to include updating execution provider flags -> was
missing on python side
- Hook into MIGraphX's int8 quantitation and optimization of models

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Required so that we can get onnxruntime to pass in models while
leveraging the existing tooling for int8 static QDQ quantization.

First step in a series of PRs which will add further static quantization
on the operator level as MIGraphX releases further support.

These changes drew heavily from the tensorRT EP should allow for similar
functionality for GPU based (versus CPU) quantization of models before
an inference is performed.

---------

Co-authored-by: Ted Themistokleous <[email protected]>
Co-authored-by: Ted Themistokleous <[email protected]>
jeffdaily pushed a commit to ROCm/onnxruntime that referenced this pull request Nov 10, 2023
### Description
<!-- Describe your changes. -->
Adding static int8 quantization support for MIGraphX Execution Provider

- Allows for parsing in calibration tables generated by Onnxruntime or
TensorRT's toolsets
- Add proper environment variables into the MIGraphX EP
- Update python API to include updating execution provider flags -> was
missing on python side
- Hook into MIGraphX's int8 quantitation and optimization of models

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Required so that we can get onnxruntime to pass in models while
leveraging the existing tooling for int8 static QDQ quantization.

First step in a series of PRs which will add further static quantization
on the operator level as MIGraphX releases further support.

These changes drew heavily from the tensorRT EP should allow for similar
functionality for GPU based (versus CPU) quantization of models before
an inference is performed.

---------

Co-authored-by: Ted Themistokleous <[email protected]>
Co-authored-by: Ted Themistokleous <[email protected]>
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
### Description
<!-- Describe your changes. -->
Adding static int8 quantization support for MIGraphX Execution Provider

- Allows for parsing in calibration tables generated by Onnxruntime or
TensorRT's toolsets
- Add proper environment variables into the MIGraphX EP
- Update python API to include updating execution provider flags -> was
missing on python side
- Hook into MIGraphX's int8 quantitation and optimization of models

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Required so that we can get onnxruntime to pass in models while
leveraging the existing tooling for int8 static QDQ quantization.

First step in a series of PRs which will add further static quantization
on the operator level as MIGraphX releases further support.

These changes drew heavily from the tensorRT EP should allow for similar
functionality for GPU based (versus CPU) quantization of models before
an inference is performed.

---------

Co-authored-by: Ted Themistokleous <[email protected]>
Co-authored-by: Ted Themistokleous <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants