-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Migraphx EP] Static int8 QDQ support #17931
[Migraphx EP] Static int8 QDQ support #17931
Conversation
Keeping this in draft until I can get some models quantized and run through your end to end examples found here: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization Let me know if you have any further desired pieces I should be adding/taking into consideration. I shall be uploading the finished end to end examples to the onnxruntime-inference-example repo as well once completed. |
@cloudhan @PeixuanZuo can you kick off CI for me? We're looking to get support of this out with our release. I've confirmed functionality with the tagged end to end example of resnet50. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@cloudhan can you enable CI again? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@TedThemistokleous FYI, for quick local development iteration to test the CI, You can grab the CI command from other PRs |
Does that also run local linting? That's what seems to be failing here. The training error I'm not seeing. We're currently using the following for builds
|
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
35225d1
to
b082a99
Compare
/azp run Linux MIGraphX CI Pipeline,orttraining-amd-gpu-ci-pipeline |
Azure Pipelines successfully started running 2 pipeline(s). |
@cloudhan there's nonsensical answers I'm getting from your linter:
Running lintrunner on onnxruntime root dir gives me the following
|
@TedThemistokleous The ci log is weird because it stripped all whitespace changes from the log, so the log itself make no sense at all. All you need is just run clang-format on that file and it should be automatically formatted. For lintrunner, you also need
|
@justinchuby Is it possible to set the Lint pipeline to automatically run without needing member approve? |
Changed during format for line length errors/lintrunner. Added delimiters
9e67aba
to
7c51b94
Compare
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline |
/azp run onnxruntime-python-checks-ci-pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 8 pipeline(s). |
/azp run Windows ARM64 QNN CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Linux MIGraphX CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
@TedThemistokleous , could you please take a look about failed lint c++ CI? |
@PeixuanZuo there is something strange about linting. For some reason our ROCm-related PRs flag far more linting issues than other non-ROCm PRs. It really hampers how quickly we can land PRs. For example, it flags include/onnxruntime/core/session/onnxruntime_c_api.h:603 which is inside the struct OrtMIGraphXProviderOptions, but looking 10 lines above inside the struct OrtTensorRTProviderOptions there are 3 lines over 120 long. It's only flagging our newly added lines. Are we being held to a higher standard than other EPs? |
Fix errors screaming in lint C++ pass Run another lintrunner pass just incase we get conflicts again
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
/azp run onnxruntime-python-checks-ci-pipeline, orttraining-amd-gpu-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, |
Azure Pipelines successfully started running 10 pipeline(s). |
Lint CI doesn’t treat one EP as special. It runs lint on this PR instead of running lint on all code in the code base, so it only shows warnings about newly added lines. |
### Description <!-- Describe your changes. --> Adding static int8 quantization support for MIGraphX Execution Provider - Allows for parsing in calibration tables generated by Onnxruntime or TensorRT's toolsets - Add proper environment variables into the MIGraphX EP - Update python API to include updating execution provider flags -> was missing on python side - Hook into MIGraphX's int8 quantitation and optimization of models ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Required so that we can get onnxruntime to pass in models while leveraging the existing tooling for int8 static QDQ quantization. First step in a series of PRs which will add further static quantization on the operator level as MIGraphX releases further support. These changes drew heavily from the tensorRT EP should allow for similar functionality for GPU based (versus CPU) quantization of models before an inference is performed. --------- Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]>
### Description <!-- Describe your changes. --> Adding static int8 quantization support for MIGraphX Execution Provider - Allows for parsing in calibration tables generated by Onnxruntime or TensorRT's toolsets - Add proper environment variables into the MIGraphX EP - Update python API to include updating execution provider flags -> was missing on python side - Hook into MIGraphX's int8 quantitation and optimization of models ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Required so that we can get onnxruntime to pass in models while leveraging the existing tooling for int8 static QDQ quantization. First step in a series of PRs which will add further static quantization on the operator level as MIGraphX releases further support. These changes drew heavily from the tensorRT EP should allow for similar functionality for GPU based (versus CPU) quantization of models before an inference is performed. --------- Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]>
### Description <!-- Describe your changes. --> Adding static int8 quantization support for MIGraphX Execution Provider - Allows for parsing in calibration tables generated by Onnxruntime or TensorRT's toolsets - Add proper environment variables into the MIGraphX EP - Update python API to include updating execution provider flags -> was missing on python side - Hook into MIGraphX's int8 quantitation and optimization of models ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Required so that we can get onnxruntime to pass in models while leveraging the existing tooling for int8 static QDQ quantization. First step in a series of PRs which will add further static quantization on the operator level as MIGraphX releases further support. These changes drew heavily from the tensorRT EP should allow for similar functionality for GPU based (versus CPU) quantization of models before an inference is performed. --------- Co-authored-by: Ted Themistokleous <[email protected]> Co-authored-by: Ted Themistokleous <[email protected]>
Description
Adding static int8 quantization support for MIGraphX Execution Provider
Motivation and Context
Required so that we can get onnxruntime to pass in models while leveraging the existing tooling for int8 static QDQ quantization.
First step in a series of PRs which will add further static quantization on the operator level as MIGraphX releases further support.
These changes drew heavily from the tensorRT EP should allow for similar functionality for GPU based (versus CPU) quantization of models before an inference is performed.