-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support of CUDA device pullbacks #1111
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
/// the derivative (gradient) is being computed. This is separate from the | ||
/// m_Variables map because all other intermediate variables will | ||
/// not be stored here. | ||
std::unordered_map<const clang::ValueDecl*, clang::Expr*> m_ParamVariables; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member variable 'm_ParamVariables' has protected visibility [cppcoreguidelines-non-private-member-variables-in-classes]
std::unordered_map<const clang::ValueDecl*, clang::Expr*> m_ParamVariables;
^
if (OverloadedDerivedFn && asGrad) { | ||
// Derivative was found. | ||
FunctionDecl* fnDecl = | ||
dyn_cast<CUDAKernelCallExpr>(OverloadedDerivedFn)->getDirectCallee(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: Called C++ object pointer is null [clang-analyzer-core.CallAndMessage]
dyn_cast<CUDAKernelCallExpr>(OverloadedDerivedFn)->getDirectCallee();
^
Additional context
lib/Differentiator/ErrorEstimator.cpp:444: Assuming 'OverloadedDerivedFn' is non-null
if (OverloadedDerivedFn && asGrad) {
^
lib/Differentiator/ErrorEstimator.cpp:444: Left side of '&&' is true
if (OverloadedDerivedFn && asGrad) {
^
lib/Differentiator/ErrorEstimator.cpp:444: Assuming 'asGrad' is true
if (OverloadedDerivedFn && asGrad) {
^
lib/Differentiator/ErrorEstimator.cpp:444: Taking true branch
if (OverloadedDerivedFn && asGrad) {
^
lib/Differentiator/ErrorEstimator.cpp:447: Assuming 'OverloadedDerivedFn' is not a 'CastReturnType'
dyn_cast<CUDAKernelCallExpr>(OverloadedDerivedFn)->getDirectCallee();
^
lib/Differentiator/ErrorEstimator.cpp:447: Called C++ object pointer is null
dyn_cast<CUDAKernelCallExpr>(OverloadedDerivedFn)->getDirectCallee();
^
const clang::CUDAKernelCallExpr*& KCE, clang::Expr*& OverloadedDerivedFn, | ||
llvm::SmallVectorImpl<clang::Expr*>& derivedCallArgs, | ||
llvm::SmallVectorImpl<clang::Expr*>& ArgResult, bool asGrad) { | ||
for (auto source : m_Sources) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'auto source' can be declared as 'auto *source' [llvm-qualified-auto]
for (auto source : m_Sources) { | |
for (auto *source : m_Sources) { |
DerivedCallArgs.front()->getType(), m_Context, 1)); | ||
OverloadedDerivedFn = m_Builder.BuildCallToCustomDerivativeKernel( | ||
customPushforward, pushforwardCallArgs, getCurrentScope(), | ||
const_cast<DeclContext*>(FD->getDeclContext()), config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: do not use const_cast [cppcoreguidelines-pro-type-const-cast]
const_cast<DeclContext*>(FD->getDeclContext()), config);
^
clad::utils::ComputeEffectiveFnName(FD) + "_pullback"; | ||
OverloadedDerivedFn = m_Builder.BuildCallToCustomDerivativeKernel( | ||
customPullback, pullbackCallArgs, getCurrentScope(), | ||
const_cast<DeclContext*>(FD->getDeclContext()), config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: do not use const_cast [cppcoreguidelines-pro-type-const-cast]
const_cast<DeclContext*>(FD->getDeclContext()), config);
^
In a separate PR we should probably move the cuda specific builtins into a separate header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1111 +/- ##
==========================================
+ Coverage 94.24% 94.26% +0.02%
==========================================
Files 48 48
Lines 8164 8198 +34
==========================================
+ Hits 7694 7728 +34
Misses 470 470
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have separate PRs for enabling kernel calls in a host function and device function call inside a kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -46,6 +46,8 @@ struct DiffRequest { | |||
clang::CallExpr* CallContext = nullptr; | |||
/// Args provided to the call to clad::gradient/differentiate. | |||
const clang::Expr* Args = nullptr; | |||
/// Indexes of global args of function as a subset of Args. | |||
std::unordered_set<size_t> GlobalArgsIndexes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member variable 'GlobalArgsIndexes' has public visibility [cppcoreguidelines-non-private-member-variables-in-classes]
std::unordered_set<size_t> GlobalArgsIndexes;
^
/// the derivative (gradient) is being computed. This is separate from the | ||
/// m_Variables map because all other intermediate variables will | ||
/// not be stored here. | ||
std::unordered_set<const clang::ValueDecl*> m_ParamVarsWithDiff; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member variable 'm_ParamVarsWithDiff' has protected visibility [cppcoreguidelines-non-private-member-variables-in-classes]
std::unordered_set<const clang::ValueDecl*> m_ParamVarsWithDiff;
^
@@ -51,6 +56,8 @@ | |||
/// that will be put immediately in the beginning of derivative function | |||
/// block. | |||
Stmts m_Globals; | |||
/// Global args of the function. | |||
std::unordered_set<const clang::ParmVarDecl*> m_GlobalArgs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: member variable 'm_GlobalArgs' has protected visibility [cppcoreguidelines-non-private-member-variables-in-classes]
std::unordered_set<const clang::ParmVarDecl*> m_GlobalArgs;
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Can you please rebase the pull-request on top of |
0aba981
to
aa25a25
Compare
Done. @kchristin22 we need a better commit message. |
aa25a25
to
52842b5
Compare
On it! |
For this purpose, a deeper look into atomic ops had to be taken. Atomic ops can only be applied on global or shared GPU memory. Hence, we needed to identify which call args of the device function pullback are actually kernel args and, thus, global. The indexes of those args are stored in a vector in the differentiation request for the internal device function and appended to the name of the pullback function. Later on, when deriving the encountered device function, the global call args are matched with the function's params based on their stored indexes. This way, the atomic ops are minimized to the absolute necessary number and no error arises.
52842b5
to
871d538
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, except for one nitpick comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Add support for device pullback functions.
m_GlobalArgs
)2.1. If the user has called
clad::gradient
on the kernel, them_GlobalArgs
set is not empty. When a call is encountered, we iterate on the args of the call and identify which reside in global memory. If such args are found, we store their index in another container (m_GlobalArgsIndexes
) and include it in the differentiation request of the device function. When the latter is derived later on usingDerivePullback()
, we check whetherm_GlobalArgsIndexes
is empty, or in another words if any of its call args are global, and if so, match them with the parameters of this function's signature (add them in the localm_GlobalArgs
container). The same procedure is followed for nested calls.2.2. If the user didn't call
clad::gradient
on the kernel, but only to the device function, then we can't tell at compile time which runtime args are global. As a result the user should copy the data to local memory first and back againNOTE: Currently
Clad
doesn't support variables in shared memory, which would be considered global as well.