Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature alpaka #8

Merged
merged 62 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
e681538
Start implementing the alpaka version of the algorithm
sbaldu Jun 26, 2023
9609dab
Add Points and PointsAlpaka data structures
sbaldu Jun 26, 2023
5be35d5
[feature_alpaka] write convolutional kernels with alpaka
slanzi00 Jun 26, 2023
3a2443c
Include cms VecArray in alpaka data formats
sbaldu Jun 27, 2023
4b26373
Corrections in Points and PointsAlpaka classes
sbaldu Jun 27, 2023
2bf91e7
Add draft of kernels for the clustering methods
slanzi00 Jun 28, 2023
91b5da4
Add copy/move constructors in PointsAlpaka and PointsAlpakaView
sbaldu Jun 29, 2023
03330bb
Add kernel for local density
sbaldu Jun 29, 2023
06e83d3
Add header for TilesAlpaka class
sbaldu Jun 29, 2023
f372afa
Add temporary main and data for testing
sbaldu Jun 29, 2023
0dd82c6
Add first draft of CLUEAlgoAlpaka class header
sbaldu Jun 29, 2023
eb73bbb
Written kernel for the search of the most energetic neighbour
sbaldu Jun 29, 2023
bfb971c
add KernelFindClusters and KernelAssignClusters
slanzi00 Jun 29, 2023
353013c
Fix non trivial copyability in the TilesAlpaka class
sbaldu Jun 30, 2023
2f376f9
Write methods for CLUEAlgoAlpaka
sbaldu Jun 30, 2023
6bfa829
Fix typing errors in CLUEAlgo
sbaldu Aug 3, 2023
48837bb
Debugging executions of alpaka kernels
sbaldu Aug 3, 2023
74f24d8
Fix std::invoke_result error in alpaka kernels
sbaldu Aug 4, 2023
5abbd6b
Add construction of tiles size in alpaka code
sbaldu Aug 5, 2023
d53c7b2
Add resize of output vectors in Points constructor
sbaldu Aug 8, 2023
3962e26
Add memcpy in make_clusters after execution of kernels. Make calc_til…
sbaldu Aug 9, 2023
d6790d7
Add header for run functions
sbaldu Aug 9, 2023
3e028d5
Structure binding of alpaka code for different backends
sbaldu Aug 9, 2023
e754156
Add compilation and binding of cuda version to makefile
sbaldu Aug 10, 2023
f9a58da
Add binding file for cuda
sbaldu Aug 10, 2023
c8f8cfd
Rewrite convolutional kernels and add method for extracting call oper…
sbaldu Aug 11, 2023
b11f072
Separate the binding of the convolutional kernels
sbaldu Aug 11, 2023
eab0a6c
Add convolutional kernels to the alpaka version of the algorithm
sbaldu Aug 11, 2023
c34ca03
Trying to restructure convolutional kernels without virtual functions
sbaldu Aug 17, 2023
51751aa
Testing new implementation of convolutional kernels
sbaldu Aug 17, 2023
b688057
Add pybind11 as submodule
sbaldu Aug 18, 2023
fa7b63d
Debugging gpu version (it doesn't find the device)
sbaldu Aug 21, 2023
074cbd5
Change initialization of devices vector
sbaldu Aug 24, 2023
c200320
Add information about the cuda architecture in the makefile
sbaldu Sep 4, 2023
9b37dfc
Use unsafe push_backs in tiles and fix globalBin
sbaldu Sep 4, 2023
fe97c96
Rewrite use of tiles and clean printouts
sbaldu Sep 4, 2023
da686d3
Use unsafe push_backs in alpaka kernels
sbaldu Sep 4, 2023
b708dbf
Use different working division when launching KernelAssignClusters
sbaldu Sep 5, 2023
3ca4208
Change max sizes of tiles VecArrays
sbaldu Sep 5, 2023
e49603d
Keep che number of tiles fixed and known at compile time
sbaldu Sep 8, 2023
76e7fac
Update pybind submodule to sbaldu's fork, which fixes error by nvcc
sbaldu Oct 16, 2023
7038b8a
Add boost and alpaka submodules
sbaldu Oct 26, 2023
9814f35
Update paths of alpaka and boost in test makefile
sbaldu Dec 7, 2023
1fb83ec
Remove unneeded headers in AlpakaCore
sbaldu Dec 7, 2023
1d7f858
Use `cms::alpakatools::divide_up_by` when creating working divisions
sbaldu Dec 12, 2023
c602e17
Add CLUEAlgo member for block size and bind its setter
sbaldu Dec 13, 2023
23d2c47
Fix wrong include
sbaldu Dec 13, 2023
b80ae4f
Change block_size to parameter for mainRun
sbaldu Dec 14, 2023
b099674
Formatting
sbaldu Dec 14, 2023
f8585c7
Use `endswith` in _read_string
sbaldu Dec 14, 2023
4b8e5e8
Clean alpaka version of `CLUEstering.py`
sbaldu Dec 14, 2023
4ede444
Implement method for listing devices from python
sbaldu Dec 20, 2023
6e483e2
Add option for choosing device from python
sbaldu Dec 20, 2023
5f6d317
Cleaning and formatting
sbaldu Dec 20, 2023
8d59b4b
Remove unneeded boost submodule
sbaldu Jan 4, 2024
4d490b3
Prepare compilation for AMD GPUs
sbaldu Jan 5, 2024
98e60ab
Update `CLUEstering.py` and Makefile
sbaldu Jan 10, 2024
e88e23a
Uncommenting other dimensions run functions
sbaldu Jan 10, 2024
cf59df8
Clean Makefile
sbaldu Jan 15, 2024
778da30
Add conditional compilation of modules and handling of import errors
sbaldu Jan 16, 2024
67ddb8e
Formatting
sbaldu Jan 16, 2024
ac55fa4
Rename energy as weight or density
sbaldu Jan 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[submodule "extern/pybind11"]
path = extern/pybind11
url = https://github.com/sbaldu/pybind11.git
branch = master
[submodule "extern/alpaka"]
path = extern/alpaka
url = https://github.com/cms-patatrack/alpaka.git
31 changes: 31 additions & 0 deletions CLUEstering/alpaka/AlpakaCore/AllocatorConfig.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#ifndef AlpakaCore_AllocatorConfig_h
#define AlpakaCore_AllocatorConfig_h

#include <limits>

namespace cms::alpakatools {

namespace config {

// bin growth factor (bin_growth in cub::CachingDeviceAllocator)
constexpr unsigned int binGrowth = 2;

// smallest bin, corresponds to binGrowth^minBin bytes (min_bin in cub::CacingDeviceAllocator
constexpr unsigned int minBin = 8; // 256 bytes

// largest bin, corresponds to binGrowth^maxBin bytes (max_bin in cub::CachingDeviceAllocator). Note that unlike in cub, allocations larger than binGrowth^maxBin are set to fail.
constexpr unsigned int maxBin = 30; // 1 GB

// total storage for the allocator; 0 means no limit.
constexpr size_t maxCachedBytes = 0;

// fraction of total device memory taken for the allocator; 0 means no limit.
constexpr double maxCachedFraction = 0.8;

// if both maxCachedBytes and maxCachedFraction are non-zero, the smallest resulting value is used.

} // namespace config

} // namespace cms::alpakatools

#endif // AlpakaCore_AllocatorConfig_h
51 changes: 51 additions & 0 deletions CLUEstering/alpaka/AlpakaCore/AllocatorPolicy.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#ifndef AlpakaCore_AllocatorPolicy_h
#define AlpakaCore_AllocatorPolicy_h

#include <alpaka/alpaka.hpp>

namespace cms::alpakatools {

// Which memory allocator to use
// - Synchronous: (device and host) cudaMalloc/hipMalloc and cudaMallocHost/hipMallocHost
// - Asynchronous: (device only) cudaMallocAsync (requires CUDA >= 11.2)
// - Caching: (device and host) caching allocator
enum class AllocatorPolicy { Synchronous = 0, Asynchronous = 1, Caching = 2 };

template <typename TDev>
constexpr inline AllocatorPolicy allocator_policy = AllocatorPolicy::Synchronous;

#if defined ALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED || defined ALPAKA_ACC_CPU_B_TBB_T_SEQ_ENABLED
template <>
constexpr inline AllocatorPolicy allocator_policy<alpaka::DevCpu> =
#if !defined ALPAKA_DISABLE_CACHING_ALLOCATOR
AllocatorPolicy::Caching;
#else
AllocatorPolicy::Synchronous;
#endif
#endif // defined ALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED || defined ALPAKA_ACC_CPU_B_TBB_T_SEQ_ENABLED

#if defined ALPAKA_ACC_GPU_CUDA_ENABLED
template <>
constexpr inline AllocatorPolicy allocator_policy<alpaka::DevCudaRt> =
#if !defined ALPAKA_DISABLE_CACHING_ALLOCATOR
AllocatorPolicy::Caching;
#elif CUDA_VERSION >= 11020 && !defined ALPAKA_DISABLE_ASYNC_ALLOCATOR
AllocatorPolicy::Asynchronous;
#else
AllocatorPolicy::Synchronous;
#endif
#endif // ALPAKA_ACC_GPU_CUDA_ENABLED

#if defined ALPAKA_ACC_GPU_HIP_ENABLED
template <>
constexpr inline AllocatorPolicy allocator_policy<alpaka::DevHipRt> =
#if !defined ALPAKA_DISABLE_CACHING_ALLOCATOR
AllocatorPolicy::Caching;
#else
AllocatorPolicy::Synchronous;
#endif
#endif // ALPAKA_ACC_GPU_HIP_ENABLED

} // namespace cms::alpakatools

#endif // AlpakaCore_AllocatorPolicy_h
145 changes: 145 additions & 0 deletions CLUEstering/alpaka/AlpakaCore/CachedBufAlloc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#ifndef AlpakaCore_CachedBufAlloc_h
#define AlpakaCore_CachedBufAlloc_h

#include <alpaka/alpaka.hpp>

#include "getDeviceCachingAllocator.h"
#include "getHostCachingAllocator.h"

namespace cms::alpakatools {

namespace traits {

//! The caching memory allocator trait.
template <typename TElem, typename TDim, typename TIdx, typename TDev, typename TQueue, typename TSfinae = void>
struct CachedBufAlloc {
static_assert(alpaka::meta::DependentFalseType<TDev>::value, "This device does not support a caching allocator");
};

//! The caching memory allocator implementation for the CPU device
template <typename TElem, typename TDim, typename TIdx, typename TQueue>
struct CachedBufAlloc<TElem, TDim, TIdx, alpaka::DevCpu, TQueue, void> {
template <typename TExtent>
ALPAKA_FN_HOST static auto allocCachedBuf(alpaka::DevCpu const& dev, TQueue queue, TExtent const& extent)
-> alpaka::BufCpu<TElem, TDim, TIdx> {
// non-cached host-only memory
return alpaka::allocAsyncBuf<TElem, TIdx>(queue, extent);
}
};

#ifdef ALPAKA_ACC_GPU_CUDA_ENABLED

//! The caching memory allocator implementation for the pinned host memory
template <typename TElem, typename TDim, typename TIdx>
struct CachedBufAlloc<TElem, TDim, TIdx, alpaka::DevCpu, alpaka::QueueCudaRtNonBlocking, void> {
template <typename TExtent>
ALPAKA_FN_HOST static auto allocCachedBuf(alpaka::DevCpu const& dev,
alpaka::QueueCudaRtNonBlocking queue,
TExtent const& extent) -> alpaka::BufCpu<TElem, TDim, TIdx> {
ALPAKA_DEBUG_MINIMAL_LOG_SCOPE;

auto& allocator = getHostCachingAllocator<alpaka::QueueCudaRtNonBlocking>();

// FIXME the BufCpu does not support a pitch ?
size_t size = alpaka::getExtentProduct(extent);
size_t sizeBytes = size * sizeof(TElem);
void* memPtr = allocator.allocate(sizeBytes, queue);

// use a custom deleter to return the buffer to the CachingAllocator
auto deleter = [alloc = &allocator](TElem* ptr) { alloc->free(ptr); };

return alpaka::BufCpu<TElem, TDim, TIdx>(dev, reinterpret_cast<TElem*>(memPtr), std::move(deleter), extent);
}
};

//! The caching memory allocator implementation for the CUDA device
template <typename TElem, typename TDim, typename TIdx, typename TQueue>
struct CachedBufAlloc<TElem, TDim, TIdx, alpaka::DevCudaRt, TQueue, void> {
template <typename TExtent>
ALPAKA_FN_HOST static auto allocCachedBuf(alpaka::DevCudaRt const& dev, TQueue queue, TExtent const& extent)
-> alpaka::BufCudaRt<TElem, TDim, TIdx> {
ALPAKA_DEBUG_MINIMAL_LOG_SCOPE;

auto& allocator = getDeviceCachingAllocator<alpaka::DevCudaRt, TQueue>(dev);

size_t width = alpaka::getWidth(extent);
size_t widthBytes = width * static_cast<TIdx>(sizeof(TElem));
// TODO implement pitch for TDim > 1
size_t pitchBytes = widthBytes;
size_t size = alpaka::getExtentProduct(extent);
size_t sizeBytes = size * sizeof(TElem);
void* memPtr = allocator.allocate(sizeBytes, queue);

// use a custom deleter to return the buffer to the CachingAllocator
auto deleter = [alloc = &allocator](TElem* ptr) { alloc->free(ptr); };

return alpaka::BufCudaRt<TElem, TDim, TIdx>(
dev, reinterpret_cast<TElem*>(memPtr), std::move(deleter), pitchBytes, extent);
}
};

#endif // ALPAKA_ACC_GPU_CUDA_ENABLED

#ifdef ALPAKA_ACC_GPU_HIP_ENABLED

//! The caching memory allocator implementation for the pinned host memory
template <typename TElem, typename TDim, typename TIdx>
struct CachedBufAlloc<TElem, TDim, TIdx, alpaka::DevCpu, alpaka::QueueHipRtNonBlocking, void> {
template <typename TExtent>
ALPAKA_FN_HOST static auto allocCachedBuf(alpaka::DevCpu const& dev,
alpaka::QueueHipRtNonBlocking queue,
TExtent const& extent) -> alpaka::BufCpu<TElem, TDim, TIdx> {
ALPAKA_DEBUG_MINIMAL_LOG_SCOPE;

auto& allocator = getHostCachingAllocator<alpaka::QueueHipRtNonBlocking>();

// FIXME the BufCpu does not support a pitch ?
size_t size = alpaka::getExtentProduct(extent);
size_t sizeBytes = size * sizeof(TElem);
void* memPtr = allocator.allocate(sizeBytes, queue);

// use a custom deleter to return the buffer to the CachingAllocator
auto deleter = [alloc = &allocator](TElem* ptr) { alloc->free(ptr); };

return alpaka::BufCpu<TElem, TDim, TIdx>(dev, reinterpret_cast<TElem*>(memPtr), std::move(deleter), extent);
}
};

//! The caching memory allocator implementation for the ROCm/HIP device
template <typename TElem, typename TDim, typename TIdx, typename TQueue>
struct CachedBufAlloc<TElem, TDim, TIdx, alpaka::DevHipRt, TQueue, void> {
template <typename TExtent>
ALPAKA_FN_HOST static auto allocCachedBuf(alpaka::DevHipRt const& dev, TQueue queue, TExtent const& extent)
-> alpaka::BufHipRt<TElem, TDim, TIdx> {
ALPAKA_DEBUG_MINIMAL_LOG_SCOPE;

auto& allocator = getDeviceCachingAllocator<alpaka::DevHipRt, TQueue>(dev);

size_t width = alpaka::getWidth(extent);
size_t widthBytes = width * static_cast<TIdx>(sizeof(TElem));
// TODO implement pitch for TDim > 1
size_t pitchBytes = widthBytes;
size_t size = alpaka::getExtentProduct(extent);
size_t sizeBytes = size * sizeof(TElem);
void* memPtr = allocator.allocate(sizeBytes, queue);

// use a custom deleter to return the buffer to the CachingAllocator
auto deleter = [alloc = &allocator](TElem* ptr) { alloc->free(ptr); };

return alpaka::BufHipRt<TElem, TDim, TIdx>(
dev, reinterpret_cast<TElem*>(memPtr), std::move(deleter), pitchBytes, extent);
}
};

#endif // ALPAKA_ACC_GPU_HIP_ENABLED

} // namespace traits

template <typename TElem, typename TIdx, typename TExtent, typename TQueue, typename TDev>
ALPAKA_FN_HOST auto allocCachedBuf(TDev const& dev, TQueue queue, TExtent const& extent = TExtent()) {
return traits::CachedBufAlloc<TElem, alpaka::Dim<TExtent>, TIdx, TDev, TQueue>::allocCachedBuf(dev, queue, extent);
}

} // namespace cms::alpakatools

#endif // AlpakaCore_CachedBufAlloc_h
Loading
Loading