-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle write-race conditions in CUDA kernels: Add atomic operation #1104
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1104 +/- ##
==========================================
+ Coverage 94.20% 94.22% +0.01%
==========================================
Files 48 48
Lines 8104 8132 +28
==========================================
+ Hits 7634 7662 +28
Misses 470 470
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
||
FunctionDecl* atomicAddFunc = nullptr; | ||
for (LookupResult::iterator it = lookupResult.begin(); | ||
it != lookupResult.end(); it++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: use range-based for loop instead [modernize-loop-convert]
it != lookupResult.end(); it++) { | |
for (auto decl : lookupResult) { |
lib/Differentiator/ReverseModeVisitor.cpp:1497:
- NamedDecl* decl = *it;
- // FIXME: check for underlying types of the pointers
+ // FIXME: check for underlying types of the pointers
|
||
FunctionDecl* atomicAddFunc = nullptr; | ||
for (LookupResult::iterator it = lookupResult.begin(); | ||
it != lookupResult.end(); it++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: use range-based for loop instead [modernize-loop-convert]
it != lookupResult.end(); it++) { | |
for (auto decl : lookupResult) { |
lib/Differentiator/ReverseModeVisitor.cpp:2317:
- NamedDecl* decl = *it;
- // FIXME: check for underlying types of the pointers
+ // FIXME: check for underlying types of the pointers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
m_Context.getTranslationUnitDecl()); | ||
|
||
FunctionDecl* atomicAddFunc = nullptr; | ||
for (auto decl : lookupResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'auto decl' can be declared as 'auto *decl' [llvm-qualified-auto]
for (auto decl : lookupResult) { | |
for (auto *decl : lookupResult) { |
m_Context.getTranslationUnitDecl()); | ||
|
||
FunctionDecl* atomicAddFunc = nullptr; | ||
for (auto decl : lookupResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'auto decl' can be declared as 'auto *decl' [llvm-qualified-auto]
for (auto decl : lookupResult) { | |
for (auto *decl : lookupResult) { |
7a1d405
to
6d6f853
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
auto* add_assign = BuildOp(BO_AddAssign, result, dfdx()); | ||
// Add it to the body statements. | ||
addToCurrentBlock(add_assign, direction::reverse); | ||
if (m_DiffReq->hasAttr<clang::CUDAGlobalAttr>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to go in this if
path when differentiating device
functions as well, right?
// Add it to the body statements. | ||
addToCurrentBlock(add_assign, direction::reverse); | ||
if (m_DiffReq->hasAttr<clang::CUDAGlobalAttr>()) { | ||
DeclarationName atomicAddId = &m_Context.Idents.get("atomicAdd"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create a separate function for finding the appropriate atomicAdd
function.
m_Context.getTranslationUnitDecl()); | ||
|
||
FunctionDecl* atomicAddFunc = nullptr; | ||
for (auto decl : lookupResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we use unresolved lookup in Sema::ActOnCallExpr
as we do in DerivativeBuilder::BuildCallToCustomDerivativeOrNumericalDiff
instead of explicitly finding the correct atomicAdd
declaration?
Expr* atomicCall = BuildCallExprToFunction(atomicAddFunc, atomicArgs); | ||
|
||
// Add it to the body statements. | ||
addToCurrentBlock(atomicCall, direction::reverse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please shift this entire if
path in a separate function such as CudaAddAssign
.
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
24a055f
to
8a12aca
Compare
clang-tidy review says "All clean, LGTM! 👍" |
3 similar comments
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
ed4e6e2
to
89bde88
Compare
clang-tidy review says "All clean, LGTM! 👍" |
@@ -104,6 +104,39 @@ Expr* getArraySizeExpr(const ArrayType* AT, ASTContext& context, | |||
return CladTapeResult{*this, PushExpr, PopExpr, TapeRef}; | |||
} | |||
|
|||
clang::Expr* ReverseModeVisitor::BuildCallToCudaAtomicAdd(clang::Expr* LHS, | |||
clang::Expr* RHS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the function not using RHS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great observation! It was left from the copy-paste. I'll fix it.
auto* add_assign = BuildOp(BO_AddAssign, result, dfdx()); | ||
// Add it to the body statements. | ||
addToCurrentBlock(add_assign, direction::reverse); | ||
if (m_DiffReq->hasAttr<clang::CUDAGlobalAttr>() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same if
-condition is repeated at two places. Can you please create a separate function for this such as: if (shouldUseCUDAAtomicOps())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I make this a variable or a function of ReverseModeVisitor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function sounds better to me.
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
20aef9d
to
cd74405
Compare
clang-tidy review says "All clean, LGTM! 👍" |
cd74405
to
4c7bea2
Compare
clang-tidy review says "All clean, LGTM! 👍" |
In case two or more threads read from the same address, when computing the reverse mode derivative this is translated into two or more threads writing in the same position:
A simple solution to this problem is to make the last plus-assign op atomic: