[CPU] Enable compressed FC via oneDNN Matmul primitive #27459

dmitry-gorokhov · 2024-11-07T13:03:08Z

Details:

This PR enables execution FullyConnected operations via OneDNN Matmul Primitive
Matmul_weights_decompression tests are splitted on x64 and arm instances, ARM tests run well via ref matmul.
Newly added functionality is still under debug caps. To try it out:
-- Build OV with: -DENABLE_DEBUG_CAPS=ON cmake option
-- export OV_CPU_ENABLE_DNNL_MAMTUL_FOR_FC=1

dmitry-gorokhov · 2024-11-13T09:46:13Z

src/plugins/intel_cpu/tests/functional/cmake/target_per_test.cmake

@@ -64,11 +64,11 @@ endif()

    # find all the source files with the name of a class file
    if(X86_64)
-        file(GLOB_RECURSE LIST_OF_TEST_ARCH_INSTANCES ${TEST_DIR}/instances/x64/${TEST_CLASS_FILE_NAME})
+        file(GLOB_RECURSE LIST_OF_TEST_ARCH_INSTANCES ${TEST_DIR}/x64/${TEST_CLASS_FILE_NAME})


TODO: remove before merge

dmitry-gorokhov · 2024-11-13T09:48:33Z

src/plugins/intel_cpu/src/cpu_memory.cpp

@@ -42,7 +42,7 @@ namespace {
        if (!ftz) {
            return;
        }
-        if (src.getDesc().getPrecision() != ov::element::f32 || dst.getDesc().getPrecision() == ov::element::bf16) {
+        if (src.getDesc().getPrecision() != ov::element::f32 || dst.getDesc().getPrecision() != ov::element::f32) {


@maxnick Please take a look. No sure about idea behind previous logic. Maybe you remember smt.
Anyway previous condition was incorrect for cases like src (fp32) -> dst (i32); it tried to apply ftz on i32 data.

The original idea is to skip denormal nullifying for all the non float source types (as it doesn't make sense for integer types and bf16, as for the latter case HW implicitly assumes FTZ and DAZ flags are set when performing bf16 specific operations) also for bf16 dst type, as HW performs flushing denormal to zeros while converting float to bf16 using specific instructions.

dmitry-gorokhov · 2024-11-13T09:53:15Z

@EgorDuplensky Please review the PR

dmitry-gorokhov added the category: CPU OpenVINO CPU plugin label Nov 7, 2024

dmitry-gorokhov added this to the 2025.0 milestone Nov 7, 2024

dmitry-gorokhov self-assigned this Nov 7, 2024

dmitry-gorokhov requested review from a team as code owners November 7, 2024 13:03

dmitry-gorokhov marked this pull request as draft November 7, 2024 13:03

github-actions bot added the category: build OpenVINO cmake script / infra label Nov 7, 2024

dmitry-gorokhov commented Nov 13, 2024

View reviewed changes

dmitry-gorokhov assigned EgorDuplensky and unassigned dmitry-gorokhov Nov 13, 2024

dmitry-gorokhov marked this pull request as ready for review November 13, 2024 09:53

dmitry-gorokhov force-pushed the feature/fc_matmul_with_decompression_executor branch from 7479c09 to 4137826 Compare November 18, 2024 09:11

dmitry-gorokhov force-pushed the feature/fc_matmul_with_decompression_executor branch 2 times, most recently from 7ffe60a to 370e2b2 Compare December 12, 2024 11:41

[CPU] Enable compressed FC via oneDNN Matmul primitive

9a336e2

dmitry-gorokhov force-pushed the feature/fc_matmul_with_decompression_executor branch from 370e2b2 to 9a336e2 Compare December 12, 2024 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Enable compressed FC via oneDNN Matmul primitive #27459

[CPU] Enable compressed FC via oneDNN Matmul primitive #27459

dmitry-gorokhov commented Nov 7, 2024 •

edited

Loading

dmitry-gorokhov Nov 13, 2024

dmitry-gorokhov Nov 13, 2024

maxnick Nov 13, 2024

dmitry-gorokhov commented Nov 13, 2024

[CPU] Enable compressed FC via oneDNN Matmul primitive #27459

Are you sure you want to change the base?

[CPU] Enable compressed FC via oneDNN Matmul primitive #27459

Conversation

dmitry-gorokhov commented Nov 7, 2024 • edited Loading

Details:

dmitry-gorokhov Nov 13, 2024

Choose a reason for hiding this comment

dmitry-gorokhov Nov 13, 2024

Choose a reason for hiding this comment

maxnick Nov 13, 2024

Choose a reason for hiding this comment

dmitry-gorokhov commented Nov 13, 2024

dmitry-gorokhov commented Nov 7, 2024 •

edited

Loading