Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attributes: remove obsolete quantization internals #2112

Conversation

dzarukin
Copy link
Contributor

I'd like to start the process of improving attributes internals responsible for quantization.
There were two major changes affected that piece of functionality:

  • v3.0 moved from static to dynamic quantization.
  • v3.4 introduced data types and groups for quantization parameters.

Both changes left a pile of technical debt towards a better solution, and it start biting in attempts to extends groups and data type support further.

To succeed with refactor, some obsolete entries required to be removed, and later to be replaced with modern analogue of that functionality in affected implementations.

This PR removes oscale abstraction and everything related to it as there's no way to invoke it through a public API. It also removes defined() method as unneeded. The change touches all teams as there are old pieces of code laying there. Please confirm you are fine with the change and will update quantized support in the correspondent backend as and if needed.

@dzarukin dzarukin requested review from a team as code owners September 23, 2024 17:39
@github-actions github-actions bot added platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 platform:gpu-nvidia Codeowner: @oneapi-src/onednn-gpu-nvidia platform:gpu-amd Codeowner: @oneapi-src/onednn-gpu-amd platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic labels Sep 23, 2024
@dzarukin
Copy link
Contributor Author

@oneapi-src/onednn-cpu-aarch64 @oneapi-src/onednn-gpu-nvidia Kind ping to check on the changes and confirm you are fine with them.
Please respond till EoW (the earlier the better), otherwise, I'll assume changes are good to go.

@mgouicem
Copy link
Contributor

@theComputeKid @t4c1
Could you take a look at this PR? It removes output scaling in quite a few implementations (though those should not have been reachable under normal use of the API).
Reviewing this PR might help you re-enable this feature under the new API.

Copy link
Contributor

@t4c1 t4c1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL/AMD/Nvidia changes look fine, but they are relatively small. The main question I have is do we actually want to remove a feature without having a replacement ready?

@mgouicem
Copy link
Contributor

mgouicem commented Sep 26, 2024

SYCL/AMD/Nvidia changes look fine, but they are relatively small. The main question I have is do we actually want to remove a feature without having a replacement ready?

We actually do have the replacement, see this RFC.

The output_scales mechanism was replaced with per-argument scales. In a nutshell, it allows:

  • to align scale with zero-point mecanism
  • to get scales at runtime instead of creation time, improving primitive cache hit-rate, and reducing caching overheads
  • to better match user storage of scales, and avoid them overheads of output_scales pre-computation.

Internally, you have two options:

  • wait for Dima to do the refactor of internals to simplify working with quantization parameters and switch to that.
  • move to current arg_scales mechanism, right now. If you want an example of the arg_scales mecanism, a good example is in src/cpu/matmul/ref_matmul.{h,c}pp.

@theComputeKid
Copy link
Contributor

@mgouicem we are currently in the process of making some changes in aarch64 for quantization and will check how it affects us. Will get back to you shortly.

cc: @renato-arantes

@theComputeKid
Copy link
Contributor

Just checked and all good from my side. From an aesthetic perspective, could you please trigger the new MacOS pipelines with Werror enabled to make sure it passes build? (You might need to rebase to run them).

@dzarukin dzarukin force-pushed the github_intel/dzarukin/attributes_refactor_p1 branch from 2d1f560 to 7cdb506 Compare September 27, 2024 18:21
@dzarukin dzarukin closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 platform:gpu-amd Codeowner: @oneapi-src/onednn-gpu-amd platform:gpu-generic Codeowner: @oneapi-src/onednn-gpu-generic platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel platform:gpu-nvidia Codeowner: @oneapi-src/onednn-gpu-nvidia
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants