This simple program showcases the usage of the device function rocprim::reduce
.
- Input data is instantiated in a
std::vector<int>
and the values are printed to the standard output. - Device storage for input and output data is allocated using
hipMalloc
. - Input data is copied from the host to the device using
hipMemcpy
. - The binary operator used in the reduction is instantiated. This example calculates the sum of the elements of the input vector, hence
rocprim::plus<int>
is the appropriate choice. - The amount of working memory needed by the reduction algorithm is calculated by a first call to
rocprim::reduce
. For the first argument, anullptr
is passed, thereby the function calculates the value oftemp_storage_bytes
and returns without launching the GPU kernel. temp_storage_bytes
amount of memory is allocated on the device.- A subsequent call to
rocprim::reduce
is made, this time passing the pointer to the working memory. This launches the GPU kernel that performs the calculation. - The result of the summation is copied back to the host and is printed to the standard output.
- All device memory is freed using
hipFree
.
- rocPRIM provides HIP parallel primitives on multiple levels of the GPU programming model. This example showcases
rocprim::reduce
which is a device function, thereby it can be called from host code. - The
rocprim::reduce
template function performs a generalized reduction, i.e. it combines a vector of values to a single value using the provided binary operator. Since the order of execution is not determined, the provided operator must be associative. In the example, an addition (rocprim::plus<int>
) is used which fulfils this property. - The device functions of
rocPRIM
require a temporary device memory location to store the results of intermediate calculations. The required amount of temporary storage can be calculated by invoking the function with matching argument set, except the first argumenttemporary_storage
must be anullptr
. In this case, the GPU kernel is not launched.
rocprim::reduce
rocprim::plus
hipMalloc
hipMemcpy
hipFree