-
Notifications
You must be signed in to change notification settings - Fork 2
Extending HIP: Mean Filter
This guide provides an in-depth example of coding a new filter in the Hydra Image Processing (HIP) library. For this example we will go through the process of adding a windowed averaging filter, or mean filter. The final mean-filter code is can be viewed in: src/c/Cuda/CudaMeanFilter.cuh.
-
Make sure you have Git-LFS installed:
git lfs install
-
Clone the most recent version of the HIP library source code:
git clone https://github.com/ericwait/hydra-image-processor.git
-
Copy
src/c/Cuda/_TemplateKernel.cuh
and rename to match the desired new filter e.g.src/c/Cuda/CudaMeanFilter.cuh
-
Rename
cudaFooFilter()
function (e.g.cudaMeanFilter
) and modify function parameters as necessary:NOTE: Most filters internally compute results using
double
precision arithmetic.minValue
andmaxValue
should be passed as parameters to convert the final result fromdouble
toPixelTypeOut
.template <class PixelTypeIn, class PixelTypeOut> __global__ void cudaMeanFilter(CudaImageContainer<PixelTypeIn> imageIn, CudaImageContainer<PixelTypeOut> imageOut, Kernel constKernelMem, PixelTypeOut minValue, PixelTypeOut maxValue) {
-
Implement CUDA functionality in the renamed
cudaFooFilter()
function for most filters this should only require modifying the innermost commented region.Helpers:
-
GetThreadBlockCoordinate
- provides access to the current center pixel coordinate -
KernelIterator
- class handles accessing an image neighborhood based on an inputKernel
class
Vec<std::size_t> threadCoordinate; GetThreadBlockCoordinate(threadCoordinate); if (threadCoordinate < imageIn.getDims()) { KernelIterator kIt(threadCoordinate, imageIn.getDims(), constKernelMem.getDims()); double outVal = 0.0; // Added a counter for renormalization double count = 0.0; for (; !kIt.end(); ++kIt) { Vec<float> imInPos = kIt.getImageCoordinate(); double inVal = (double)imageIn(imInPos); Vec<std::size_t> coord = kIt.getKernelCoordinate(); float kernVal = constKernelMem(coord);
In the case of the mean filter, we sum the values in the non-zero kernel region. Kernel renormalization is simple to implement, in this case, we maintain a second accumulator (
count
) that counts the pixels visited (NOTE: in general the normalizing accumulator should sum over the kernel weights see e.g.cudaMultiplySum
)if (kernVal != 0.0f) { outVal += inVal; count += 1.0; }
Compute the mean and convert the output value to the appropriate image type:
} imageOut(threadCoordinate) = (PixelTypeOut)CLAMP(outVal/count, minValue, maxValue); } }
-
-
Within the template
.cuh
file, renamecFooFilter()
function (e.g.cMeanFilter
) and again modify function parameters as necessary:template <class PixelTypeIn, class PixelTypeOut> void cMeanFilter(ImageView<PixelTypeIn> imageIn, ImageView<PixelTypeOut> imageOut, ImageView<float> kernel, int device = -1) {
-
Implement wrapper functionality, for most kernels this should only require calling the
cuda
filter function implemented in the previous section.Helpers:
-
CudaDevices
- computes GPU device details for a specific CUDA kernel, used in computing maximum input/output buffer sizes -
calculateBuffers
- function computes the maximum GPU buffer sizes as well as thethreads/blocks
for CUDA call. -
CudaDeviceImages
- template class for memory allocation and IO on GPU device, can be used for swapping GPU buffers in the case of multiple sequential operations (e.g. seeCudaOpener.cuh
)
const PixelTypeOut MIN_VAL = std::numeric_limits<PixelTypeOut>::lowest(); const PixelTypeOut MAX_VAL = std::numeric_limits<PixelTypeOut>::max(); const int NUM_BUFF_NEEDED = 2; CudaDevices cudaDevs(cudaMeanFilter<PixelTypeIn, PixelTypeOut>, device); std::size_t maxTypeSize = MAX(sizeof(PixelTypeIn), sizeof(PixelTypeOut)); std::vector<ImageChunk> chunks = calculateBuffers(imageIn.getDims(), NUM_BUFF_NEEDED, cudaDevs, maxTypeSize, kernel.getSpatialDims()); Vec<std::size_t> maxDeviceDims; setMaxDeviceDims(chunks, maxDeviceDims); omp_set_num_threads(MIN(chunks.size(), cudaDevs.getNumDevices())); #pragma omp parallel default(shared) { const int CUDA_IDX = omp_get_thread_num(); const int N_THREADS = omp_get_num_threads(); const int CUR_DEVICE = cudaDevs.getDeviceIdx(CUDA_IDX); CudaDeviceImages<PixelTypeOut> deviceImages(NUM_BUFF_NEEDED, maxDeviceDims, CUR_DEVICE); Kernel constKernelMem(kernel,CUR_DEVICE); for (int i = CUDA_IDX; i < chunks.size(); i += N_THREADS) {
For the mean filter we simply need to call
cudaMeanFilter
:if (!chunks[i].sendROI(imageIn, deviceImages.getCurBuffer())) std::runtime_error("Error sending ROI to device!"); deviceImages.setAllDims(chunks[i].getFullChunkSize()); cudaMeanFilter<<<chunks[i].blocks, chunks[i].threads>>>( *(deviceImages.getCurBuffer()), *(deviceImages.getNextBuffer()), constKernelMem, MIN_VAL, MAX_VAL); deviceImages.incrementBuffer(); chunks[i].retriveROI(imageOut, deviceImages.getCurBuffer());
Clean-up kernel memory if necessary
} constKernelMem.clean(); } }
-
The newly created header (CudaMeanFilter.cuh
) must be included in src/c/Cuda/CWrapperAutogen.cu
:
// CWrapperAutogen.cu
...
#include "CudaMeanFilter.cuh"
...
-
Add the interface definition to
src/c/ScriptCmds/ScriptCommands.h
// ScriptCommands.h ... SCR_CMD(MeanFilter, SCR_PARAMS ( SCR_INPUT(SCR_IMAGE(SCR_DYNAMIC), imageIn), SCR_OUTPUT(SCR_IMAGE(SCR_DYNAMIC), imageOut), SCR_INPUT(SCR_IMAGE_CONVERT(float), kernel), SCR_OPTIONAL(SCR_SCALAR(int), device, -1) ), cMeanFilter ) ...
Macro Information:
-
SCR_CMD(Name, Params, CudaFunc)
- This macro is the key to defining a new script interface command.-
Name
- The name of the command when called from script (e.g.MeanFilter
will be called asHIP.MeanFilter(...)
in MATLAB/Python) -
Params
- List of parameters in the EXACT same order as defined in the CUDA C++ wrapper -
CudaFunc
- The name of the CUDA C++ wrapper defined above (e.g.cMeanFilter
)
-
-
SCR_PARAMS(...)
- This macro is used to interpret the parameter list and MUST be used as a wrapper for theParams
input toSCR_CMD
. NOTE: Script functions that have no parameters (input/output) are not supported. -
SCR_INPUT(TypeMacro, Name)
- Defines an input parameter -
SCR_OUTPUT(TypeMacro, Name)
- Defines an output parameter -
SCR_OPTIONAL(TypeMacro, Name, DefVal)
- Defines an optional input parameter with a default value - Supported
TypeMacro
values:-
SCR_SCALAR(Type)
- A single numeric input/output. -
SCR_VECTOR(Type)
- A 3-vector of numeric values. -
SCR_IMAGE(Type)
- An input NumPy array or MATLAB array representing up to a 5-D volume. -
SCR_IMAGE_CONVERT(Type)
- An input NumPy array or MATLAB array representing up to a 5-D volume. The input will be copy-converted internally to the specified C++Type
.
-
NOTE: For all
TypeMacros
,Type
must be a standard C++ numeric type (e.g.int
,float
) orSCR_DYNAMIC
.SCR_DYNAMIC
indicates that the type will be determined based on the input NumPy/MATLAB array type, this should generally only be used for the input/output images.NOTE: The script interface makes use of C++ macros and template metaprogramming to simplify the definition of new filters. Mistakes in these definitions can create very long and difficult to understand compiler errors. To help in identifying the root cause of errors, we have added static_assert blocks with the prefix
HIP_COMPILE
. Searching the compiler output forHIP_COMPILE
is a good first step if errors are encountered. -
-
Create a script command class (e.g.
src/c/ScriptCmd/Commands/ScrCmdMeanFilter.h
)For most filters only a help string need be defined in the class definition using the
SCR_HELP_STRING(Str)
macro.#pragma once #include "ScriptCommandImpl.h" #include "ScriptCommandDefines.h" SCR_COMMAND_CLASSDEF(MeanFilter) { public: SCR_HELP_STRING("This will take the mean of the given neighborhood.\n" "\timageIn = This is a one to five dimensional array. The first three dimensions are treated as spatial.\n" "\t\tThe spatial dimensions will have the kernel applied. The last two dimensions will determine\n" "\t\thow to stride or jump to the next spatial block.\n" "\n" "\timageOut = This will be an array of the same type and shape as the input array.\n" "\n" "\tkernel = This is a one to three dimensional array that will be used to determine neighborhood operations.\n" "\t\tIn this case, the positions in the kernel that do not equal zeros will be evaluated.\n" "\t\tIn other words, this can be viewed as a structuring element for the max neighborhood.\n" "\n" "\tdevice (optional) = Use this if you have multiple devices and want to select one explicitly.\n" "\t\tSetting this to [] allows the algorithm to either pick the best device and/or will try to split\n" "\t\tthe data across multiple devices.\n"); };
-
Include the new class header (e.g.
ScrCmdMeanFilter.h
) insrc/c/ScriptCmds/ScriptCommandModule.h
// ScriptCommandModule.h ... #include "Commands/ScrCmdMeanFilter.h" ...
-
(Optional) Define type mappings in
src/c/ScriptCmds/ScriptioMaps.h
If the filter returns a different output image type than the input image type, a type mapping must be defined for each supported type. This can be done in
ScriptioMaps.h
.
At this point the filter should be fully defined and it should compile and generate both MATLAB/Python bindings for the new command. For detailed compilation instructions see Compiling