OpenCL Cookbook: Hello World using C++ host binding

Last time, in the OpenCL Cookbook series, I presented a hello world example using OpenCL and C for the host binding language. This time I present a very similar example but using the C++ host binding language. As you already know from previous posts the host language that interfaces with an OpenCL device can be any number of languages such as C, C++, Java, C# and Python.

So far I’ve been using the C API but I’ve decided to switch to the C++ API for two reasons: (1) it’s considerably less lines of code being more succinct and (2) it supports exceptions meaning that you do not have to check error codes for every line of binding code that you write. So, here follows, a brief primer of the C++ OpenCL binding. It’s a very simple example but trust me – we’ll be getting to more complex examples soon (time is the issue).

OpenCL kernel

__kernel void hello_world (__global char* message, int messageSize) {
	for (int i =0; i < messageSize; i++) {
		printf("%s", message[i]);

The kernel (OpenCL function) above receives a char array (in essence a string) from the host as well as the size of the char array (as there is no way to derive an array's size from the array itself (Java programmers gasp in shock and disgust). The kernel simply iterates over all the letters in the char array and prints them one at a time to standard output thereby printing the message: "Hello World!". Now let's look at the C++ code that interfaces with this kernel.

C++ host binding


#include <fstream>
#include <iostream>
#include <iterator>
#include <CL/cl.hpp>
#include <CL/opencl.h>

using namespace std;

int main () {

    vector<cl::Platform> platforms;
    vector<cl::Device> devices;
    vector<cl::Kernel> kernels;
    try {
        // create platform
        platforms[0].getDevices(CL_DEVICE_TYPE_GPU, &devices);

        // create context
        cl::Context context(devices);

        // create command queue
        cl::CommandQueue queue(context, devices[0]);

        // load opencl source
        ifstream cl_file("");
        string cl_string(istreambuf_iterator<char>(cl_file), (istreambuf_iterator<char>()));
        cl::Program::Sources source(1, make_pair(cl_string.c_str(), 
            cl_string.length() + 1));

        // create program
        cl::Program program(context, source);

        // compile opencl source;

        // load named kernel from opencl source
        cl::Kernel kernel(program, "hello_world");

        // create a message to send to kernel
        char* message = "Hello World!";
        int messageSize = 12;

        // allocate device buffer to hold message
        cl::Buffer buffer(CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, 
            sizeof(char) * messageSize, message);

        // set message as kernel argument
        kernel.setArg(0, buffer);
        kernel.setArg(1, sizeof(int), &messageSize);

        // execute kernel

        // wait for completion

        cout << endl;
    } catch (cl::Error e) {
        cout << endl << e.what() << " : " << e.err() << endl;
    return 0;

The above C++ host binding code is annotated to say what it's doing at each step but I'll provide a brief overview. Initially it's creating a platform, a context and a command queue which are basic opencl binding data structures that are required to interface with an opencl device. It then loads the opencl source from a separate file and with it creates a program. The program is built which compiles the opencl source. It then loads a specific kernel (function) from that source by a given name. It creates a string message on the host side but in order to send it to the device it must create a buffer of the same size as the message. The buffer is created and set as a kernel argument along with the size of the message we are sending.

The kernel is then executed and we wait for its completion on the host. The finish command flushes all outstanding tasks to the device and waits for them to finish. Note the clean exception handling using a try/catch wrap around the entire code instead of having to check error codes produced by each statement. I much prefer the C++ api to the C API. I think you'll agree that it's more concise and cleaner. Till next time.

8 thoughts on “OpenCL Cookbook: Hello World using C++ host binding

  1. Pingback: OpenCL Cookbook: Series Reference - Dhruba Bandopadhyay

  2. Pingback: OpenCL Cookbook: Hello World using C# Cloo binding - Dhruba Bandopadhyay

  3. Chatsiri

    Strlen(message) declares in for-loop easy more than send size of message to kernel function because SetArg member functions not set size of message send to kernel.

  4. Anonymous

    > The kernel (OpenCL function) above receives a char array (in essence a string) from the host as well as the size of the char array (as there is no way to derive an array’s size from the array itself (Java programmers gasp in shock and disgust).

    Quick nitpicky correction – you aren’t passing an array, you’re passing a pointer. Those are two entirely different concepts. Of course you can’t derive an array’s size – you aren’t passing any arrays!

    Or in other words, the only thing you’re passing is the address of the first element of that array. Nothing more. Java programmer or not, there isn’t anything to gasp here at – it’s fairly obvious that it’s not possible to determine a memory block’s size just by knowing its address.

  5. Adam Moss

    Took me a lot of debugging to figure out that compilation was barfing on printf() being implicitly declared – this is CL 1.2 and will fail on earlier versions (at least, without using an extension). Worth pointing out.
    After you can see any build warnings or errors with something like this code:

    // check for warnings etc
    for (cl::Device& device: devices)
    cl_build_status status;
    program.getBuildInfo(device, CL_PROGRAM_BUILD_STATUS, &status);
    cerr << "PROGRAM_BUILD_STATUS: " << status << endl;
    std::string build_log;
    program.getBuildInfo(device, CL_PROGRAM_BUILD_LOG, &build_log);
    cerr << "PROGRAM_BUILD_LOG: " << build_log << endl;

  6. Soniya Tiwari

    I am running your code in visual studio 2010 Intel package. The error that I am getting is clSetKernelArg: -38. Could be please what could be the problem.

    1. Marc-Olivier Andrez

      Hi Soniya,

      I got the same error on Mac OS. The ‘context’ parameter seems to be missing when constructing the buffer:
      // allocate device buffer to hold message
      cl::Buffer buffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
      sizeof(char) * messageSize, const_cast(message));

      By the way, a good way to debug is to display the values for all the error codes for setClKernelArg (
      std::cout << "CL_INVALID_KERNEL=" << CL_INVALID_KERNEL
      << std::endl;

      I still have errors when calling "queue.finish();" and am looking for a solution…

      I hope it helps!

      Best regards,


      1. Marc-Olivier Andrez

        The other bug is in the kernel: when using printf, %c should be used instead of %s (characters instead of 0 terminated string). Using %s leads to an infinite loop, possibly because the ending 0 at index 12 is not passed to the kernel (we pass 12 elements, not 13).

        The correct kernel is:
        __kernel void hello_world (__global char* message, int messageSize) {
        for (int i =0; i < messageSize; i++) {
        printf("%c", message[i]);


Leave a Reply