-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor C++ language extensions and C++ support #3673
base: docs/develop
Are you sure you want to change the base?
Conversation
e911a63
to
da93aea
Compare
0c52420
to
31fa125
Compare
760259e
to
6d6c430
Compare
058fdde
to
3eddfa4
Compare
3eddfa4
to
c485169
Compare
21b4cb3
to
021e579
Compare
@MKKnorr Please rebase on docs/develop. |
ac2b239
to
761c1f8
Compare
58960c9
to
4b37cf5
Compare
4b37cf5
to
0cf8414
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me. I left a few comments.
automatically reduce shared memory usage. The compilation fails, if the compiler | ||
can not generate code that satisfies the launch bounds. | ||
|
||
On NVCC this parameter maps to the ``.maxntid`` PTX directive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On NVCC this parameter maps to the ``.maxntid`` PTX directive. | |
On NVCC this parameter maps to the ``.maxntid`` PTX directive. |
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. This and the "porting from CUDA __launch_bounds" section should go to the porting guide, but that is being done in another issue. I'll add a note to that issue
HIP supports ``__threadfence()``, ``__threadfence_block()`` and | ||
``__threadfence_system()``. | ||
|
||
On AMD devices, ``__threadfence_system()``, has restrictions and therefore needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it is referring to some implicit knowledge the user must have relative to threadfence_system? What are the "restrictions"? And is it appropriate to call it a workaround in the docs? Seems like a workaround is good for an issue or bug report, but not for user docs. Perhaps "requires the following process"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I had more information on this, but I couldn't find information proving or disproving this, so I just reformatted the already existing content
kernel launch is present in the code. | ||
|
||
To compile device code and include kernel launches, a compiler with full HIP | ||
support is needed. On AMD platforms this is ``amdclang++``, whereas on NVIDIA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the reference to NVIDIA and nvcc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I also wouldn't mention the other ones. It is kind of weird to say that "On AMD platforms this is amdclang, but you can also use hipcc which is a wrapper around it"
An important difference between the host and device code C++ support is | ||
exception handling. In device code, exceptions aren't available due to | ||
the hardware architecture. The device code must use return codes to handle | ||
errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an example here would be overkill. Also it is quite hard to show an example of what you can not do
Kernel Compilation | ||
================================================================================ | ||
|
||
``hipcc`` now supports compiling C++/HIP kernels to binary code objects. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``hipcc`` now supports compiling C++/HIP kernels to binary code objects. The | |
``hipcc`` now supports compiling C++/HIP kernels to binary code objects. The |
Can we replace hipcc here with amdclang++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. As an aside, I think this whole section about "Kernel compilation" does not belong here, but there is also a separate issue about that
ef46f97
to
00a1e34
Compare
No description provided.