Skip to content

Commit

Permalink
Align GpuComplex to its size (AMReX-Codes#3691)
Browse files Browse the repository at this point in the history
## Summary

As discussed in AMReX-Codes#3677, this PR makes the alignment of
`amrex::GpuComplex` stricter to allow for coalesced memory accesses of
arrays of GpuComplex by nvidia GPUs such as A100.

Note that this may break `reinterpret_cast` from an array allocated as
`std::complex` to `amrex::GpuComplex`, but not the other way around.

## Additional background

Typical allocators (malloc, amrex CArena) give memory aligned to 16
bytes and CUDA allocators aligned to 256 bytes, which is sufficient for
`amrex::GpuComplex<double>`.

## Checklist

The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate
  • Loading branch information
AlexanderSinn authored Jan 9, 2024
1 parent e0b77e1 commit 656bb64
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion Src/Base/AMReX_GpuComplex.H
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,12 @@ T norm (const GpuComplex<T>& a_z) noexcept;
* work in device code with Cuda yet.
*
* Should be bit-wise compatible with std::complex.
*
* GpuComplex is aligned to its size (stricter than std::complex) to allow for
* coalesced memory accesses with nvidia GPUs.
*/
template <typename T>
struct GpuComplex
struct alignas(2*sizeof(T)) GpuComplex
{
using value_type = T;

Expand Down

0 comments on commit 656bb64

Please sign in to comment.