Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu: nvidia: Refactor to native parameters for matmul #2111

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ShanoToni
Copy link
Contributor

Description

This refactor is required to address current issues in nvidia backend matmul. Currently the use of runtime dimensions requires native-specific member objects to be initialised, which are specific to the dimensions (Tensor descriptors, matrix layouts). This is handled currently by cleaning up the members and re-initializing on execution. This is not thread safe.

The refactor proposes abstraction of the members to separate structs, which in the non-runtime dimension case would be initialised with the primitive descriptor and passed to the primitive on its respective creation. When not using runtime dimensions the parameters are cleaned up when the primitive is destroyed.

In the runtime dimension case, the parameters are initialised only with the non native specific members. Inside the execute call a copy is created of the new params struct and only then are the native specific structs initialised for the copy. When using runtime dimensions the params are cleaned up at the end of the host_task running the matmul.

Checklist

General

  • [ x ] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • [ x ] Have you formatted the code using clang-format?

@ShanoToni ShanoToni requested a review from a team as a code owner September 23, 2024 17:07
@github-actions github-actions bot added the platform:gpu-nvidia Codeowner: @oneapi-src/onednn-gpu-nvidia label Sep 23, 2024
@ShanoToni ShanoToni changed the title Refactor to native parameters for matmul gpu: nvidia: Refactor to native parameters for matmul Sep 24, 2024
@@ -116,24 +122,25 @@ struct cudnn_matmul_t : cudnn_matmul_base_t {
}
return true;
}

std::shared_ptr<cublas_params> params_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we want to have shared ownership over the params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be created in the primitive descriptor and it is used by the implementation, attempting to pass a unique_ptr back and forward from impl to primitive might not be desirable.

Copy link
Contributor

@densamoilov densamoilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring looks very nice to me, a long-awaited one to be honest. Thanks!

@ShanoToni ShanoToni force-pushed the cublas_matmul_runtime_refactor branch from 2341c03 to a950e3e Compare October 2, 2024 16:51
@ShanoToni ShanoToni force-pushed the cublas_matmul_runtime_refactor branch from a950e3e to b209e92 Compare October 2, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:gpu-nvidia Codeowner: @oneapi-src/onednn-gpu-nvidia
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants