Skip to content

Extending HIP: Mean Filter

Mark Winter edited this page Mar 29, 2019 · 2 revisions

Introduction

This guide provides an in-depth example of coding a new filter in the Hydra Image Processing (HIP) library. For this example we will go through the process of adding a windowed averaging filter, or mean filter. The final mean-filter code is can be viewed in: src/c/Cuda/CudaMeanFilter.cuh.

Prerequisites

  1. Make sure you have Git-LFS installed:

    git lfs install
    
  2. Clone the most recent version of the HIP library source code:

    git clone https://github.com/ericwait/hydra-image-processor.git
    

Implementing the CUDA filter

  1. Copy src/c/Cuda/_TemplateKernel.cuh and rename to match the desired new filter e.g. src/c/Cuda/CudaMeanFilter.cuh

  2. Rename cudaFooFilter() function (e.g. cudaMeanFilter) and modify function parameters as necessary:

    NOTE: Most filters internally compute results using double precision arithmetic. minValue and maxValue should be passed as parameters to convert the final result from double to PixelTypeOut.

    template <class PixelTypeIn, class PixelTypeOut>
    __global__ void cudaMeanFilter(CudaImageContainer<PixelTypeIn> imageIn,
                                CudaImageContainer<PixelTypeOut> imageOut,
                                Kernel constKernelMem,
                                PixelTypeOut minValue,
                                PixelTypeOut maxValue)
    {
  3. Implement CUDA functionality in the renamed cudaFooFilter() function for most filters this should only require modifying the innermost commented region.

    Helpers:

        Vec<std::size_t> threadCoordinate;
        GetThreadBlockCoordinate(threadCoordinate);
    
        if (threadCoordinate < imageIn.getDims())
        {
            KernelIterator kIt(threadCoordinate, imageIn.getDims(), constKernelMem.getDims());
            double outVal = 0.0;
            // Added a counter for renormalization
            double count = 0.0;
            for (; !kIt.end(); ++kIt)
            {
                Vec<float> imInPos = kIt.getImageCoordinate();
                double inVal = (double)imageIn(imInPos);
                Vec<std::size_t> coord = kIt.getKernelCoordinate();
                float kernVal = constKernelMem(coord);

    In the case of the mean filter, we sum the values in the non-zero kernel region. Kernel renormalization is simple to implement, in this case, we maintain a second accumulator (count) that counts the pixels visited (NOTE: in general the normalizing accumulator should sum over the kernel weights see e.g. cudaMultiplySum)

                if (kernVal != 0.0f)
                {
                    outVal += inVal;
                    count += 1.0;
                }

    Compute the mean and convert the output value to the appropriate image type:

            }
            imageOut(threadCoordinate) = (PixelTypeOut)CLAMP(outVal/count, minValue, maxValue);
        }
    }

Implementing the C++/CUDA wrapper

  1. Within the template .cuh file, rename cFooFilter() function (e.g. cMeanFilter) and again modify function parameters as necessary:

    template <class PixelTypeIn, class PixelTypeOut>
    void cMeanFilter(ImageView<PixelTypeIn> imageIn,
                    ImageView<PixelTypeOut> imageOut,
                    ImageView<float> kernel,
                    int device = -1)
    {
  2. Implement wrapper functionality, for most kernels this should only require calling the cuda filter function implemented in the previous section.

    Helpers:

    • CudaDevices - computes GPU device details for a specific CUDA kernel, used in computing maximum input/output buffer sizes
    • calculateBuffers - function computes the maximum GPU buffer sizes as well as the threads/blocks for CUDA call.
    • CudaDeviceImages - template class for memory allocation and IO on GPU device, can be used for swapping GPU buffers in the case of multiple sequential operations (e.g. see CudaOpener.cuh)
        const PixelTypeOut MIN_VAL = std::numeric_limits<PixelTypeOut>::lowest();
        const PixelTypeOut MAX_VAL = std::numeric_limits<PixelTypeOut>::max();
        const int NUM_BUFF_NEEDED = 2;
    
        CudaDevices cudaDevs(cudaMeanFilter<PixelTypeIn, PixelTypeOut>, device);
    
        std::size_t maxTypeSize = MAX(sizeof(PixelTypeIn), sizeof(PixelTypeOut));
        std::vector<ImageChunk> chunks = calculateBuffers(imageIn.getDims(), NUM_BUFF_NEEDED, cudaDevs, maxTypeSize, kernel.getSpatialDims());
    
        Vec<std::size_t> maxDeviceDims;
        setMaxDeviceDims(chunks, maxDeviceDims);
    
        omp_set_num_threads(MIN(chunks.size(), cudaDevs.getNumDevices()));
        #pragma omp parallel default(shared)
        {
            const int CUDA_IDX = omp_get_thread_num();
            const int N_THREADS = omp_get_num_threads();
            const int CUR_DEVICE = cudaDevs.getDeviceIdx(CUDA_IDX);
    
            CudaDeviceImages<PixelTypeOut> deviceImages(NUM_BUFF_NEEDED, maxDeviceDims, CUR_DEVICE);
            Kernel constKernelMem(kernel,CUR_DEVICE);
    
            for (int i = CUDA_IDX; i < chunks.size(); i += N_THREADS)
            {

    For the mean filter we simply need to call cudaMeanFilter:

                if (!chunks[i].sendROI(imageIn, deviceImages.getCurBuffer()))
                    std::runtime_error("Error sending ROI to device!");
    
                deviceImages.setAllDims(chunks[i].getFullChunkSize());
    
                cudaMeanFilter<<<chunks[i].blocks, chunks[i].threads>>>(
                                            *(deviceImages.getCurBuffer()),
                                            *(deviceImages.getNextBuffer()),
                                            constKernelMem, MIN_VAL, MAX_VAL);
    
                deviceImages.incrementBuffer();
    
                chunks[i].retriveROI(imageOut, deviceImages.getCurBuffer());
    

    Clean-up kernel memory if necessary

            }
    
            constKernelMem.clean();
        }
    }

Include the new filter in image processing library

The newly created header (CudaMeanFilter.cuh) must be included in src/c/Cuda/CWrapperAutogen.cu:

// CWrapperAutogen.cu
...
#include "CudaMeanFilter.cuh"
...

Defining script interface for the filter

  1. Add the interface definition to src/c/ScriptCmds/ScriptCommands.h

    // ScriptCommands.h
    ...
    SCR_CMD(MeanFilter, SCR_PARAMS
    	(
    		SCR_INPUT(SCR_IMAGE(SCR_DYNAMIC), imageIn),
    		SCR_OUTPUT(SCR_IMAGE(SCR_DYNAMIC), imageOut),
    		SCR_INPUT(SCR_IMAGE_CONVERT(float), kernel),
    		SCR_OPTIONAL(SCR_SCALAR(int), device, -1)
    	),
    	cMeanFilter
    )
    ...

    Macro Information:

    • SCR_CMD(Name, Params, CudaFunc) - This macro is the key to defining a new script interface command.
      • Name - The name of the command when called from script (e.g. MeanFilter will be called as HIP.MeanFilter(...) in MATLAB/Python)
      • Params - List of parameters in the EXACT same order as defined in the CUDA C++ wrapper
      • CudaFunc - The name of the CUDA C++ wrapper defined above (e.g. cMeanFilter)
    • SCR_PARAMS(...) - This macro is used to interpret the parameter list and MUST be used as a wrapper for the Params input to SCR_CMD. NOTE: Script functions that have no parameters (input/output) are not supported.
    • SCR_INPUT(TypeMacro, Name) - Defines an input parameter
    • SCR_OUTPUT(TypeMacro, Name) - Defines an output parameter
    • SCR_OPTIONAL(TypeMacro, Name, DefVal) - Defines an optional input parameter with a default value
    • Supported TypeMacro values:
      • SCR_SCALAR(Type) - A single numeric input/output.
      • SCR_VECTOR(Type) - A 3-vector of numeric values.
      • SCR_IMAGE(Type) - An input NumPy array or MATLAB array representing up to a 5-D volume.
      • SCR_IMAGE_CONVERT(Type) - An input NumPy array or MATLAB array representing up to a 5-D volume. The input will be copy-converted internally to the specified C++ Type.

    NOTE: For all TypeMacros, Type must be a standard C++ numeric type (e.g. int, float) or SCR_DYNAMIC. SCR_DYNAMIC indicates that the type will be determined based on the input NumPy/MATLAB array type, this should generally only be used for the input/output images.

    NOTE: The script interface makes use of C++ macros and template metaprogramming to simplify the definition of new filters. Mistakes in these definitions can create very long and difficult to understand compiler errors. To help in identifying the root cause of errors, we have added static_assert blocks with the prefix HIP_COMPILE. Searching the compiler output for HIP_COMPILE is a good first step if errors are encountered.

  2. Create a script command class (e.g. src/c/ScriptCmd/Commands/ScrCmdMeanFilter.h)

    For most filters only a help string need be defined in the class definition using the SCR_HELP_STRING(Str) macro.

    #pragma once
    
    #include "ScriptCommandImpl.h"
    #include "ScriptCommandDefines.h"
    
    SCR_COMMAND_CLASSDEF(MeanFilter)
    {
    public:
        SCR_HELP_STRING("This will take the mean of the given neighborhood.\n"
                        "\timageIn = This is a one to five dimensional array. The first three dimensions are treated as spatial.\n"
                        "\t\tThe spatial dimensions will have the kernel applied. The last two dimensions will determine\n"
                        "\t\thow to stride or jump to the next spatial block.\n"
                        "\n"
                        "\timageOut = This will be an array of the same type and shape as the input array.\n"
                        "\n"
                        "\tkernel = This is a one to three dimensional array that will be used to determine neighborhood operations.\n"
                        "\t\tIn this case, the positions in the kernel that do not equal zeros will be evaluated.\n"
                        "\t\tIn other words, this can be viewed as a structuring element for the max neighborhood.\n"
                        "\n"
                        "\tdevice (optional) = Use this if you have multiple devices and want to select one explicitly.\n"
                        "\t\tSetting this to [] allows the algorithm to either pick the best device and/or will try to split\n"
                        "\t\tthe data across multiple devices.\n");
    };
  3. Include the new class header (e.g. ScrCmdMeanFilter.h) in src/c/ScriptCmds/ScriptCommandModule.h

    // ScriptCommandModule.h
    ...
    #include "Commands/ScrCmdMeanFilter.h"
    ...
  4. (Optional) Define type mappings in src/c/ScriptCmds/ScriptioMaps.h

    If the filter returns a different output image type than the input image type, a type mapping must be defined for each supported type. This can be done in ScriptioMaps.h.

Compile

At this point the filter should be fully defined and it should compile and generate both MATLAB/Python bindings for the new command. For detailed compilation instructions see Compiling