-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a variable sized SoA macro definition. #4
base: clue_porting_to_cmssw
Are you sure you want to change the base?
Conversation
Applied it to define HGCCLUESoA.
@b-fontana , is this the kind of SoA definitions you were looking for? |
Of course the same can be applied to (uncalibreated) rec hits as well. |
Hi Eric, I'll try to have a look Thursday or Friday.
|
/* CUDA allocations are already aligned */ | ||
mMemCLUEDev = cms::cuda::make_device_unique<std::byte[]>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we mean different things here (I have to admit my comment in the code was misleading). I was not trying to align the overall memory block. I was instead making sure the total size of the allocated memory is a multiple of the warpSize (32), and further making sure that each variable within the SoA is also aligned, so that no warp must allocate unnecessary memory blocks (that is why I am later using pad_
when defining the layout of the SoA). If I would not do this, the only variable of the SoA which would surely be aligned would be the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the macros take care of aligning the arrays for efficient access in the GPU. The purpose of the macros is to avoid the repetition of alignment code in all SoA structures.
The code supposes that the data block itself is properly aligned (which is the case with the CUDA allocator, but not guaranteed on the CPU side).
The array alignment can be defined at runtime, but I found only compile-time solutions for allocation alignment on the CPU side.
The default 128 byte alignment should be correct in most situations (see link in the code comment).
#define _ASSIGN_SOA_COLUMN_OR_SCALAR_IMPL(IS_COLUMN, TYPE, NAME) \ | ||
BOOST_PP_CAT(NAME, _) = reinterpret_cast<TYPE *>(curMem); \ | ||
BOOST_PP_IIF(IS_COLUMN, \ | ||
curMem += ((nElements_ * sizeof(TYPE) / byteAlignment_) + 1) * byteAlignment_; \ | ||
, \ | ||
curMem += ((sizeof(TYPE) / byteAlignment_) + 1) * byteAlignment_; \ | ||
) | ||
|
||
#define _ASSIGN_SOA_COLUMN_OR_SCALAR(R, DATA, TYPE_NAME) \ | ||
_ASSIGN_SOA_COLUMN_OR_SCALAR_IMPL TYPE_NAME | ||
|
||
#define _ACCUMULATE_SOA_ELEMENT_IMPL(IS_COLUMN, TYPE, NAME) \ | ||
BOOST_PP_IIF(IS_COLUMN, \ | ||
ret += ((nElements * sizeof(TYPE) / byteAlignment) + 1) * byteAlignment; \ | ||
, \ | ||
ret += ((sizeof(TYPE) / byteAlignment) + 1) * byteAlignment; \ | ||
) | ||
|
||
#define _ACCUMULATE_SOA_ELEMENT(R, DATA, TYPE_NAME) \ | ||
_ACCUMULATE_SOA_ELEMENT_IMPL TYPE_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code takes care of aligning the blocks bytewise. Looking at it, a -1 is missing. I will fix that.
/* For CUDA applications, we align to the 128 bytes of the cache lines. \ | ||
* See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-memory-3-0 this is still valid \ | ||
* up to compute capability 8.X. \ | ||
*/ \ | ||
constexpr static size_t defaultAlignment = 128; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default alignment is defined here, and should be a good default.
Re-aligned stray line continuations.
Range checking is enabled by defining the DEBUG macro.
We can use |
Applied it to define HGCCLUESoA.
Possible modifications for getting automatic definition of variable sized SoA. The SoAs are defined with macros from
CUDADataFormats/Common/interface/SoAmacros.h
so the package should be add to test.