attributes: remove obsolete quantization internals #2112

dzarukin · 2024-09-23T17:39:12Z

I'd like to start the process of improving attributes internals responsible for quantization.
There were two major changes affected that piece of functionality:

v3.0 moved from static to dynamic quantization.
v3.4 introduced data types and groups for quantization parameters.

Both changes left a pile of technical debt towards a better solution, and it start biting in attempts to extends groups and data type support further.

To succeed with refactor, some obsolete entries required to be removed, and later to be replaced with modern analogue of that functionality in affected implementations.

This PR removes oscale abstraction and everything related to it as there's no way to invoke it through a public API. It also removes defined() method as unneeded. The change touches all teams as there are old pieces of code laying there. Please confirm you are fine with the change and will update quantized support in the correspondent backend as and if needed.

dzarukin · 2024-09-24T23:35:12Z

@oneapi-src/onednn-cpu-aarch64 @oneapi-src/onednn-gpu-nvidia Kind ping to check on the changes and confirm you are fine with them.
Please respond till EoW (the earlier the better), otherwise, I'll assume changes are good to go.

mgouicem · 2024-09-26T06:47:13Z

@theComputeKid @t4c1
Could you take a look at this PR? It removes output scaling in quite a few implementations (though those should not have been reachable under normal use of the API).
Reviewing this PR might help you re-enable this feature under the new API.

t4c1

SYCL/AMD/Nvidia changes look fine, but they are relatively small. The main question I have is do we actually want to remove a feature without having a replacement ready?

mgouicem · 2024-09-26T08:12:09Z

SYCL/AMD/Nvidia changes look fine, but they are relatively small. The main question I have is do we actually want to remove a feature without having a replacement ready?

We actually do have the replacement, see this RFC.

The output_scales mechanism was replaced with per-argument scales. In a nutshell, it allows:

to align scale with zero-point mecanism
to get scales at runtime instead of creation time, improving primitive cache hit-rate, and reducing caching overheads
to better match user storage of scales, and avoid them overheads of output_scales pre-computation.

Internally, you have two options:

wait for Dima to do the refactor of internals to simplify working with quantization parameters and switch to that.
move to current arg_scales mechanism, right now. If you want an example of the arg_scales mecanism, a good example is in src/cpu/matmul/ref_matmul.{h,c}pp.

theComputeKid · 2024-09-26T08:28:44Z

@mgouicem we are currently in the process of making some changes in aarch64 for quantization and will check how it affects us. Will get back to you shortly.

cc: @renato-arantes

theComputeKid · 2024-09-27T10:06:21Z

Just checked and all good from my side. From an aesthetic perspective, could you please trigger the new MacOS pipelines with Werror enabled to make sure it passes build? (You might need to rebase to run them).

dzarukin requested review from a team as code owners September 23, 2024 17:39

t4c1 reviewed Sep 26, 2024

View reviewed changes

theComputeKid approved these changes Sep 27, 2024

View reviewed changes

dzarukin added 5 commits September 27, 2024 11:21

common: attr: zero-points: add verbose support

61949c5

cpu: clean up obsolete scale macros

c45df37

common: styling: renamed obsolete type to what it serves

ce0ea75

src: remove output_scales_ once and for all

5194502

common: attributes: remove defined() method as obsolete

7cdb506

dzarukin force-pushed the github_intel/dzarukin/attributes_refactor_p1 branch from 2d1f560 to 7cdb506 Compare September 27, 2024 18:21

dzarukin closed this Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attributes: remove obsolete quantization internals #2112

attributes: remove obsolete quantization internals #2112

dzarukin commented Sep 23, 2024

dzarukin commented Sep 24, 2024

mgouicem commented Sep 26, 2024

t4c1 left a comment

mgouicem commented Sep 26, 2024 •

edited

Loading

theComputeKid commented Sep 26, 2024

theComputeKid commented Sep 27, 2024

attributes: remove obsolete quantization internals #2112

attributes: remove obsolete quantization internals #2112

Conversation

dzarukin commented Sep 23, 2024

dzarukin commented Sep 24, 2024

mgouicem commented Sep 26, 2024

t4c1 left a comment

Choose a reason for hiding this comment

mgouicem commented Sep 26, 2024 • edited Loading

theComputeKid commented Sep 26, 2024

theComputeKid commented Sep 27, 2024

mgouicem commented Sep 26, 2024 •

edited

Loading