See other versions |
After coding the kernel, you will now learn about the host application. The host application is written in either C or C++ using OpenCL APIs calls to interact with the FPGA accelerators.
An example host code is provided in the ./reference-files/src
folder, which shows both the C++ version and the C version. In this tutorial, you will only be looking at the C++ version.
- The C++ version of the host code can be found in the host.cpp file.
- The C version of the host code can be found in the host.c file.
In general, you can divide the structure of the host application into three sections:
- Setting up the hardware.
- Executing the kernels.
- Releasing the hardware resources after the kernel returns.
As you go through this tutorial, you will be looking at each step individually.
The application must start by setting up and initializing the FPGA. This typically involves the following steps:
- Retrieving the list of available Xilinx platforms.
- Retrieving the list of devices supported by each Xilinx platform.
- Creating a context.
- Creating a program object from the pre-compiled FPGA binary (xclbin).
- Creating a kernel object
As you work through this section, refer to step 1 in the host.cpp file.
TIP: This lab references the C++ code, but C-code is also provided in the reference files. For more information on specific OpenCL API calls listed here, refer to the OpenCL Reference Pages.
-
The application should start by identifying the platforms composed of Xilinx FPGA devices. To identify the presence of Xilinx platforms, you should use the
cl::Platform::get
OpenCL API. This call returns the available platforms in the system.cl::Platform::get(&platforms)
After retrieving the available platform, the host will verify the platform for a particular vendor. Because each platform contains a vendors' installation, a system can have a mix of platforms. The
cl::platform.getInfo
API call will return the specific information about the available OpenCL platform. In this host code, you will retrieve the platform vendor information to verify it with the user input,XILINX
.platform.getInfo<CL_PLATFORM_NAME>(&err)
Now the host needs to select a particular device from the respective platform. This is done using the
cl::platform::getDevices
API.platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices)
NOTE: In the current C++ code, the above operations are combined and kept inside a user-defined
get_devices("Xilinx")
function defined in thehost.hpp
file.get_devices("Xilinx")
-
After you select the platform and the device, you need to create a context. The context is used by the runtime to manage objects, such as command-queues and kernel objects. The context is created using the
cl::Context
OpenCL API.cl::Context context(device, NULL, NULL, NULL, &err))
-
After creating a context, you create a command-queue. The application will place commands in this queue for actions like transferring data, executing kernels and synchronizing. These commands are then scheduled on the devices within the context. The command queue is created using the
cl::CommandQueue
OpenCL API.cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE, &err)
-
Next, you need to create a program object. The program object is created from the pre-compiled FPGA binary file (xclbin). It contains the collection of user-defined kernel functions and is programed onto the FPGA.
TIP: The xclbin is the compiled kernel binary created, as explained in Building an Application.
First, the application needs to read the contents of xclbin file. In this tutorial, you will use a user-defined function,
read_binary_file
, to achieve this. This function returns a pointer to the contents of xclbin file.fileBuf = read_binary_file(binaryFile, fileBufSize)
Then, create a
cl::Program::Binaries
object to store the contents of the xclbin binary file.cl::Program::Binaries bins{{fileBuf, fileBufSize}}
Lastly, create the program object and initialize it with the contents of xclbin binary stored in the
bins
variable. To do this, use thecl::Program program
API.cl::Program program(context, devices, bins, NULL, &err)
This step programs the FPGA with the binary loaded in the
bins
variable. If successful, this function will returnCL_SUCCESS
; make sure to check the return code. -
Next, you must create kernel objects. Kernel objects are handles which the software application will use to pass arguments to the actual hardware kernels, and execute them. Kernel objects are created using the
cl::Kernel
API.cl::Kernel krnl_vector_add(program,"vadd", &err)
NOTE: The mentioned operations are common to most applications and can be reused.
Now that you set up the hardware, the host application is ready to issue commands to the device and interact with the kernel. These commands include:
- Buffer transfer to/from the FPGA
- Setting up the kernel arguments
- Kernel execution on the FPGA
- Event Synchronization
As you work through this section, refer to step 2 in the host.cpp file.
-
First, you must create buffers in the global memory. Buffers are used to transfer data back and forth between the host and the device. Kernels will read, process, and write back data in these buffers. Buffer objects are created using the
cl::Buffer
API.cl::Buffer buffer_in1 (context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, vector_size_bytes, source_in1.data(), &err)
You will create the following buffers:
buffer_in1
: Stores source_in1buffer_in2
: Stores source_in2buffer_output
: Stores results (source_hw_results)
-
Before executing the kernel, you need to set each of its arguments. Kernel arguments are either scalar values or buffer objects. Kernel arguments are set using the
cl::Kernel::setArg
API.krnl_vector_add.setArg(0, buffer_in1)
This will pass to the kernel the pointers where the input data is located, where the output should be stored, and what is the size of each buffer. The following arguments are set:
in1
(input): Input Vector1in2
(input): Input Vector2out
(output): Output Vectorsize
(input): Size of Vector in Integer
-
Next, request the transfer of input data from the host memory to the device memory (global memory) using the
cl::CommandQueue::enqueueMigrateMemObjects
API.q.enqueueMigrateMemObjects({buffer_in1, buffer_in2},0/* 0 means from host*/)
-
Now, request the execution of the kernel using the
cl::CommandQueue::enqueueTask
API.q.enqueueTask(krnl_vector_add)
-
Then, request the transfer of the output results from the device global memory to the host memory. To do this, use the
cl::CommandQueue::enqueueMigrateMemoryObjects
API.q.enqueueMigrateMemObjects({buffer_output},CL_MIGRATE_MEM_OBJECT_HOST)
-
Lastly, wait for the completion of all the requests placed in the command queue.
q.finish();
It is important to understand that an "enqueue" API call does not actually execute the specified command; it only requests its execution. When an "enqueue" function returns, it does not mean that that the command has actually been executed. It is up to the runtime to schedule the execution of the command. Therefore, the application must use synchronization methods to know when commands have completed.
The last step in building the host application is releasing the objects. As you work through this section, refer to step 3 in the host.cpp file. The C++ wrapper releases the objects automatically once the object passes out of scope.
The next step in this tutorial is to compile, link, and run the application and the kernel.
Return to Getting Started Pathway — Return to Start of Tutorial
Copyright© 2019 Xilinx